Going Low: The Encoding Secrets for Small Sizes With High Quality
These results also may explain the differences between my limited tests and those achieved by Netflix. That is, had my starting point reflected 200 percent constrained VBR delivery, or a slightly more aggressive VBV size, the overall percentage change that I produced would have fallen much more in line with those of Netflix. Note that Netflix uses a buffer of 2X the average data rate, so any differences between my results and theirs could relate solely to this difference in our starting points.
Very Slow Preset
In the blog post, Netflix discussed optimizing encoding settings using “larger motion search range” and “more exhaustive mode evaluation,” which “allows an encoder to evaluate more encoding options at the expense of compute time.” It also described the ability to include more B-frames. To accomplish all this, I switched from the default medium preset to the very slow preset, which jumps from three B-frames to eight, from three reference frames to 16, and incorporates more sophisticated searching for redundancies.
The cost? Encoding time more than doubled. However, it did deliver an average .41 dB in PSNR improvement in our three test files, and it is an adjustment you can make for ABR video as well as downloadable video, though you’d be essentially doubling your encoding costs.
Chunking the Video
To chunk the video, I divided each source file into 15-second chunks. Then I encoded each chunk with x264 using CRF encoding at a value of 23 to compute a target bit rate for each chunk. Then I created optimized encoding settings for each segment using the parameters identified above.
For example, when encoded to a CRF value of 23, the first chunk of Meridian had an average data rate of 1,304, which became my target for the two-pass encoding. Table 3 shows how the data rate and quality varied over the 12 chunks in the Meridian file.
In terms of workflow, I encoded each chunk separately using its unique encoding parameters, and then concatenated them into a single file using Solveig Multimedia’s Video Splitter, a $49.95 utility that can split and join MP4 files without re-encoding. You could also do this in FFmpeg, which I used for the WebM files, but Video Splitter was so much easier and worked perfectly.
To be clear, I didn’t compute and average the PSNR values shown in Table 3. Rather, Video Splitter produced a playable, usable file and I measured that. Unfortunately, after completing the procedure, the PSNR value of the final files averaged .13 dB lower. Fortunately, as you’ll see, the same procedure produced much better results with VP9 encoding.
To this point, all the results were achieved at the starting data rate for each file; so it was time to push the envelope and go low. To accomplish this, I encoded a single file at 75 percent of the target data rate used in the first series of tests, and then reproduced the chunked analysis at 75 percent of the original targets. You can see the results in Figure 2.
To explain, Start is the 39.60 dB starting point from my starting point encodes. Optimized single (41.27) is the PSNR value achieved with a single file at the original data rate, while 41.27 was the average value achieved by chunking at the original data rate. The next two values were for encodes at 75 percent of the target.
With a single file optimized at 75 percent of the original target, the PSNR values for the three files averaged 40.45, well above the 39.60 starting point, indicating that I had room for further data rate reductions. The chunked procedure disappointed again, producing a slightly lower PSNR value of 39.99, although that was still higher than the 39.60 starting point.
Table 4 shows the before-and-after values for all the parameters that I tweaked, the improvement delivered, the trade-off presented, and whether the tweak is suitable for streaming. The last column is self-explanatory, except for the bitrate control/buffer size control. Here, I recognize that many producers stream at up to 200 percent constrained VBR, using larger VBV buffers. Both alternatives are worth exploring to improve streaming quality if you test to ensure against a degradation in QoE. Again, you’ll see what I mean if you read this article.
Let’s run through the VP9 analysis and then we’ll revisit chunked performance.
VP9 was a rinse-and-repeat procedure, though the starting point and some of the parameters changed. Specifically, with VP9, I referenced the same data rate targets suggested by the x264 CRF encodes, but cut my starting data rates to 70 percent of those targets. As you can see in the table, the H.264 starting point was again 39.60, but cutting the data rate to 70 percent reduced that to 39.36 at my initial encoding parameters. In other words, without tweaking, at 70 percent of the target bitrate, VP9 delivered a slightly lower quality encode than my H.264 starting point.
Table 5 shows the same math for VP9 as Table 2 does for H.264.
Table 6 summarizes the optimizations I applied and their implications. The first, second, and fourth changes were identical to those applied to H.264, but in the third I started encoding with a preset of two, and changed to the highest quality 0 for the optimized encodes. As you can see from both tables, chunking improved results with VP9, even more so when we went low.
At the 70 percent target data rate of the VP9 encodes, optimization boosted the single file encode to 40.80 dB, and the chunked file to 41.06, giving me quite a bit of overhead as compared to my 39.60 starting point. So I dropped the target data rate for the single file and chunked analysis to 80 percent of the starting point, which was already 70 percent of the H.264 target. By my math, this means a starting point of 56 percent of the original H.264 target, or a reduction of 44 percent.
Figure 3 shows the results. As you can see, even at 80 percent of the original 70 percent target, both the optimized single file and particularly the chunked file produce higher PSNR values than the 39.60 target. How were these improvements distributed across the file.
To determine this, I used the Moscow University Video Quality Measurement tool. Specifically, Figure 4 shows a PSNR comparison of the Tears of Steel test file in the results visualization screen. The top graph shows the PSNR comparisons over the entire 3-minute test duration, while the bottom graph shows the detail from the black area in the top graph.
The results shown in red are from the starting point H.264 file, which was 31,348 KB in size. The blue line tracks the results of the 80 percent optimized single file VP9 encode, which was 15,494 in size, for a total reduction of just over 50 percent. As you can see, the two lines track closely throughout the graph, though the PSNR value of the VP9 file was 39.86, about 1 percent higher than the starting point of 39.45. That’s pretty impressive.
To Chunk or Not to Chunk?
Let’s briefly revisit the issue of why chunking worked better for VP9 than for H.264. My guess is that the rate control mechanism for x264 is simply better than that in VP9. That is, when encoding with 300 percent constrained VBR, the x264 encoder identified the hard-to-encode and easy-to-encode sections and doled out the data rate as efficiently for the entire file as it did when working with the individual segments. With VP9, the encoder was more efficient working with the individual segments, which may indicate that it wasn’t quite as capable in spreading the data rate out over the longer file.
Should you try the segmented approach with your own H.264 encoder? Definitely. Even though segments of 15 seconds didn’t improve the results, you might achieve superior results with 6- or 10-second segments. Clearly, if you’re working with VP9, encoding in segments should definitely be incorporated.
As a final note, please take all of these techniques as suggestions rather than as gospel. Try them on your own encoding platforms, but test, test, test (and then test some more) before actually deploying them in your download or streaming delivery systems.
[This article appears in the June 2017 issue of Streaming Media Magazine as "Going Low."]
If you're not using a video quality measurement tool, you're behind the curve. Here's a look at the most popular tools and how they work.
How can publishers compare video quality at different resolutions? There's the theoretically correct answer and then there's how it's generally done.
Today's market is too competitive for subpar experiences. If companies aren't monitoring quality of service and quality of experience, they're likely losing viewers—and profits.
From pico to infinite, encoders come in all shapes and sizes. Here's all the information businesses need to choose the best encoder for their situation.
Many companies spend too much on adaptive bitrate encoding. In turns out there's a pricey way to go about it and a cheaper way. Dynamic packaging to the rescue!
Frequent flyers and train riders rejoice as Netflix announces an untethered option. The full catalog isn't supported and details are scarce.
Companies and Suppliers Mentioned