June 29, 2017
By Jan Ozer Contributing Editor
Featured Articles

Going Low: The Encoding Secrets for Small Sizes With High Quality

Let’s start with two observations. First, nothing in video compression is free. With each encoding parameter you adjust, you trade off quality for encoding time, which translates to cost, compatibility, playability, or deliverability. To gain in quality, you necessarily lose somewhere else. There is no magic; optimization is just a series of controlled and (hopefully) informed decisions about these tradeoffs. Second, no analysis is ever complete; it’s simply a point-in-time observation of a work in process, hopefully to advance the overall state of analysis.

You feeling me? If so, let’s begin.

The Challenge: How Low Can You Go?

This article was inspired by an email from editor Eric Schumacher-Rasmussen, saying, in essence, that “Netflix and Amazon are now enabling downloads, the files are really small but look great. How are they doing that?” Soon thereafter, Netflix published a blog post on its process entitled, “More Efficient Mobile Encodes for Netflix Downloads,” which laid out its recipe in general terms. We asked Amazon for similar guidance, which it declined to provide.

Netflix’s analysis was massive, involving 600 full-length movies or episodes encoded at multiple resolutions on Netflix’s cloud-based rendering platform, assessed with objective quality metrics, peak signal-to-noise ratio (PSNR) and video multi-method assessment fusion (VMAF). Using the techniques discussed in the blog post, Netflix was able to maintain the same quality of its existing H.264 starting point while reducing H.264 files by 15.3 percent and VP9 files by 35.9 percent (as measured by PSNR).

Our budget wasn’t quite that large. I tested with three videos at a single resolution, encoded on my trusty HP Z840 using the PSNR, multiscale structural similarity (MS SSIM), and video quality metric (VQM) measurements, although I’ll only share PSNR results herein. There are a lot of numbers in this analysis, and presenting them clearly was a real challenge.

I produced the same quality as my H.264 starting point with a 25 percent+ reduction in data rate with H.264, and 44 percent+ with VP9. Please hold your applause until the end however, since there are many, many differences between what I did and what Netflix did, and it’s impossible to compare our results. Still, if you’re looking to produce the lowest quality mobile downloads, you’re in the right place.

Test Files and Assumptions

Let’s quickly cover my test files and test assumption. I tested with segments from Blender movies Tears of Steel (3 minutes) and Sintel (2 minutes), as well as 3 minutes from Netflix’s own test video, Meridian. I tested at 848x480 resolution only, creating master 480p files from all three 4K sources, and encoding from and comparing to these master 480p files.

Why this resolution? Because it seems to strike a good balance between mobile viewing resolution and the ability to deliver at a very low data rate. In addition, the recently released Global Media Format report from Encoding.com showed a 100 percent increase in 480p deployment, from 6 percent in 2015 to 12 percent in 2016, presumably for mobile. If you don’t produce at that resolution, however, don’t sweat, as the techniques discussed below should work at any resolution.

The next major decision was the starting point, which is obviously critical to the overall percentage savings. Netflix compared its test results to “our current streams,” which were encoded using the Main H.264 profile. With Netflix, however, this is a moving target, as the company applies per-title encoding for all videos in its library (see “How Netflix Pioneered Per-Title Video Encoding Optimization”). So the first step I had to take was to identify the appropriate data rate and encoding technique for my starting point encodes.

To compute this, I encoded each file with x264 set at CRF 23, which in my experience corresponds with the quality levels used by studios for their iTunes downloads, at least at 1080p and 720p. (For more on this technique, check out “How to Use Objective Quality Measurement Tools.”) I measured PSNR as a reality check and found all values between 39.46 and 41.65 dB, which generally means very good quality. For perspective, in its “Per-Title Encode Optimization” post, Netflix commented that values above 45 dB typically don’t result in perceptible quality improvements, and that scores below 35 dB often exhibit noticeable artifacts. I was pretty much square in the middle.

While constant rate factor (CRF) encoding suggests the data rate, it’s not an appropriate technique for adaptive bitrate encoding because it can produce wild data rate swings that may hinder the ability to smoothly deliver the file, particularly over mobile connections. So using this CRF computed data rate as a target, I encoded each file using two-pass variable bitrate (VBR), with a maximum data rate of 110 percent of the target and a video buffering verifier (VBV) buffer equivalent to 1 second of video. For the record, I recommend 110 percent constrained VBR because of a series of tests documented on my Streaming Learning Center website at that showed that 200 percent constrained VBR can degrade overall QoE. I recommend the VBV buffer setting because of a series of tests that shows that higher settings increase stream variability, which again hinders deliverability.

Yes, I know that Apple recently changed its HLS recommendation from 110 percent constrained VBR to 200 percent constrained VBR, but that doesn’t change my findings and conclusions; 110 percent constrained VBR is what I use and that’s what I recommend. So, it’s what I used as my starting point. If it sounds overly conservative, note that many producers still encode using constant bitrate (CBR), which produces even lower quality.

As you can see in Table 1, applying these restrictions dropped the quality of the CRF encode for all files, which I expected. However, for comparison purposes, the production encodes would be my starting point, since these would be the data rates and PSNR values for “normal encodes” targeted at adaptive bitrate (ABR) delivery.

lowt1

To clearly set the stick in the ground, the production encode numbers in Table 1 were those I had to beat, and the 39.60 dB average is the number I will continually point to. The issue was, how much could I drop the delivery data rate and continue to equal or exceed those numbers?

Tweaking Overview

In the Netflix Mobile blog post, the company achieved its bitrate savings three general ways: 1) using VP9, 2) optimizing the encoding settings for H.264 and VP9, and 3) segmenting the files, and optimizing each segment separately rather than each file as a whole. I covered the same three points, although in a slightly different order; first I optimized H.264, then applied per-chunk optimization. Then I did the same for VP9.

Note that for each adjustment, I discussed its utility both for files encoded for download and files encoded for adaptive delivery. Some were usable for download only, but several could also be used to improve the quality of your adaptive bitrate (ABR) deliveries.

Optimizing H.264

Table 2 shows the changes that I applied to my production encodes and their improvement in PSNR dB and as a percentage. All were directly inspired by comments in the Netflix blog post.

lowt2

The first adjustment was to change from the main H.264 profile to the high profile. The potential downside of this change is compatibility, as some much older iOS devices don’t play video encoded in the high profile.

For Android, Google still recommends using the baseline profile for all H.264 encodes because the H.264 decoder in the Android OS plays baseline only, and they have no idea which hardware codecs are deployed on Android devices. Still, Apple’s most recent HLS documents recommend using the high profile in all streams, and you can expect most Android devices to have similar hardware capabilities as their iOS counterparts. So, moving from the main profile to high is probably a safe change to make, irrespective of whether you’re encoding for download or streaming, although the gains were modest, averaging .26 dB, which was 0.66 percent.

Keyframe Interval

In its discussion of optimized settings, Netflix identified “increased random access picture period,” which I took to mean increasing the keyframe interval. In my production encodes, I recommend a keyframe interval every 2 seconds, which I changed to one every 15 seconds for my optimized encodes. As you can see in Table 2, this increased PSNR dB by almost a full percentage point, so it’s worthwhile for mobile downloads.

One downside of increasing they keyframe interval is decreased random access within the file. That is, if you expect your viewers to continuously navigate to random points in the video, increasing the keyframe interval could increase latency during these operations. For linear playback of movies and TV shows from a mobile download, however, the impact should be negligible.

However, when encoding for ABR delivery, keyframes must divide evenly into segment duration, which is typically 6 seconds or less. So the longest keyframe interval you should use when encoding for ABR delivery would be the duration of your ABR segments.

Bitrate Control/Buffer Size

The biggest increase in quality occurred when I changed from 110 percent to 300 percent constrained VBR, and increased the VBV buffer from the equivalent of 1 second of video to 3 seconds. This boosted PSNR by .77 dB and almost 2 full percentage points. However, between the keyframe interval adjustment and this one, we changed the data rate from a highly deliverable flatline to a roller coaster ride of highs and lows that could easily stall out during streaming delivery (Figure 1). When played back from your hard disk, this doesn’t matter, but which stream would you rather deliver to a mobile phone connecting via 3G?

low1