How Netflix Pioneered Per-Title Video Encoding Optimization
One-size-fits-all encoding doesn't produce the best possible results, so Netflix recently moved to per-title optimization. Learn why this improves video quality and saves on bandwidth, but isn't the right model for every company.
Editor’s Note: This article was rewritten to incorporate answers posed in the original article and answered by David Ronca, Director, Encoding Technology at Netflix, and Anne Aaron, Manager, Video Algorithms at Netflix. The author would like to express his appreciation to David and Anne for sharing their expertise with Streaming Media readers.
The Netflix blog post entitled Per-Title Encode Optimization boldly declares that “to deliver the best quality video to our members, each title should receive a unique bitrate ladder, tailored to its specific complexity characteristics.” In a world where many companies simply deploy Apple’s recommendations from TN2224 without modification, it’s a breath of fresh air. The blog post goes on to detail how Netflix creates its per-title encoding ladders.
While the Netflix post provides some valuable universal truths, since Netflix is a subscription service there are some caveats that should be considered by companies that aren’t in similar businesses. After a quick overview, I’ll discuss these truths and caveats.
As you’ll read at the end of this post, on January 26, 2016, at 2:00 PM EST I’ll be hosting a webinar detailing lessons learned from the Netflix post, and describing a procedure companies can use to implement the content-aware encoding that Netflix so strongly advocates. You can read more about the webinar here.
Overview: Per-Title Optimization
The encoding world has long been dominated by one-size-fits all encoding “ladders,” or resolution/bitrate pairs. In its blog post, Netflix shared that it had previously used the following combinations to produce “good quality encodes” for most content.
Table 1. Netflix’s traditional one-size-fits-all bitrate ladder.
Netflix then described the problem with this approach, which is that for some challenging videos, “the highest 5800kbps stream would still exhibit blockiness.” At the other end of the spectrum, “for simple content like cartoons, 5800 kbps is far more than needed to produce excellent 1080p encodes. In addition, a customer whose network bandwidth is constrained to 1750 kbps might be able to watch the cartoon at HD resolution, instead of the SD resolution specified by the ladder above.” In short, each video has a unique complexity, and a single encoding ladder can’t optimize the efficiency or viewing experience for all viewers.
To represent this “very high diversity in signal characteristics” of the videos the Netflix encodes, the blog presented the following graph, which showed 100 files encoded using x264’s constant QP (quantization parameter), which encodes each file to a consistent quality. At a high level, QP encoding seeks to deliver a certain quality level, and varies the data rate to achieve this. Netflix is measuring the quality using the Peak Signal-to-Noise ratio, where higher scores indicate better quality.
Figure 1. A representation of the bitrate/PSNR of 100 Netflix titles.
To create the graph, Netflix encoded all files at four different QP levels, as you can see by the four points on the bottom lavender-colored line. Looking at that plot, plus the aqua line immediately above it, you can see that even though the QP encoding delivered a high data rate, the quality level, which was around 38 dB for both files, was comparatively low. This indicates that these files are challenging to encode.
At the other end of the spectrum, the aqua line pointing nearly vertical at the top of the graph topped out at over 48 dB at 2Mbps, despite using the same QP value as the two encodes at the bottom. That’s dramatically higher quality at less than 10 percent of the data rate, indicating that that the top aqua line represents an easy to encode file. As it relates to the compression ladder, these results prove that a one-size-fits-all solution either applies too high a data rate to the file on top of the graph or too low a data rate to the files on the bottom.
Okay, you get it; some files are hard to compress, some files are easy to compress, so you should encode them using different bitrate ladders. Before moving on, I wanted to tie PSNR scores to subjective ratings, which Netflix is obviously qualified to do. Specifically, for that hard-to-compress file at the bottom of the graph, a PSNR level of 38 dB is “acceptable.” At other points in the discussion, Netflix says that scores under 35 dB will show encoding artifacts, while scores above 45 dB produce no perceptible quality improvements. While I don't favor PSNR (as explained below), these are all useful data points for those who use the metric.
Those readers familiar with x.264 probably know that there’s an alternative to QP encoding called Constant Rate Factor (CRF) encoding, that adjusts quality to scene content. We asked Netflix if it had considered using CRF encoding to gauge the encoding complexity of the file, and Ronca responded, “We started with QP and recently migrated to CRF. The results are about the same.”
Speaking of x.264, it’s long been speculated that Netflix is using some kind of pre- or post-processing function to optimize quality, so we asked about this. Ronca responded that Netflix was using Plain x.264, and continued, “But the techniques we describe should apply to any codec. The point that gets lost sometimes is that our work is really pre-encode step to determine the best recipe for the encoder. In the past, an expert encodist would have made these decisions. We just got it to work at very large scale.”
Plotting the Convex Hull
After establishing that all files needed different encoding ladders, the blog post goes on to describe how Netflix produces the ladder. At a high level, Netflix runs a number of test encodes at different resolutions and QP values to plot the PSNR quality at each data rate/resolution pair, and uses that to identify the optimum encoding ladder.
One observation made in the post is that while increasing the data rate at the same resolution consistently increases stream quality, these quality increases flatten out once the bitrate goes above a certain threshold. You can see this for the low, mid, and high resolution plots in Figure 2. If you plot a line that includes the peak quality/bitrate efficiency points from all resolutions, you get a “convex hull,” a term describing the shape that most efficiently bounds all data points.
Figure 2. Plotting the convex hull, where each resolution or resolutions delivers maximum quality.
Here my grasp of the math and technique described in the post becomes strained. It seems obvious that for each resolution, the data rate selected would be the point on the convex hull. And Netflix is clear that it produces with a finite set of resolutions. What’s unclear is if each resolution gets a single encode, or if Netflix encodes at multiple data rates at the same resolutions.
This statement causes my confusion: “The bitrate selection is also limited to a finite set, where the adjacent bitrates have an increment of roughly 5%.” Does this mean that there are multiple encodes at bitrates roughly 5% apart, or if these are the bitrates for which Netflix tried to ascertain the highest quality resolution, in essence the test targets?
Note that this is a critical issue. The procedure detailed in the blog post focuses solely on optimizing quality, not on whether or not the encoding ladder performs well in the context of an adaptive group. In this regard, Apple Tech Note TN2224 advises producers to keep “adjacent bit rates at a factor of 1.5 to 2 apart.” A seminal Adobe white paper on the topic explains why: “Too many bit rates too close to one another could result in too many stream switches, even with smaller bandwidth fluctuations. Besides the slight overhead in switching, the viewer's experience with too-frequent quality fluctuations may not be pleasant.” So one big question is how many adaptive variants are produced for each source file, and how that changes with different content.
We asked Netflix how many bitrates were in the final group, and if they ware 5 percent apart, or more spaced out. Netflix provided the sample ladder shown in Table 2, and responded, “The total number of bitrates is dependent on the title. Ideally, there is one JND [just noticeable difference] between each bitrate. The sample below is the CBE for an animated original. This is representative but would vary per-title.”
Table 2: Old and new encoding ladder for animated footage. Note the significant data rate savings at 720x480 and above.
Another critical question is the encoding technique actually used for Netflix’s production encodes. Specifically, while Netflix clearly uses QP encoding as a tool to identify the optimal data rate target for each file, we were curious as to the technique used to encode the final video, particularly in view of the 10 percent variability threshold dictated in Apple Technote TN2224. Ronca advised that it uses two-pass VBR, with limited encoder parameters to maintain compatibility with legacy devices. As an example, Ronca related that the maximum buffer used was around 200 percent of the average target bitrate.
The Mystery of Netflix's Comparisons
Upon first reading, the biggest mystery was how Netflix computed PSNR on files with varying resolutions, since most objective quality tools like the Moscow University Video Quality Measurement Tool can’t perform cross-resolution testing of any kind (read the review). To be clear, the source file for all encodes is (presumably) a 1080p original. To compute PSNR, however, you need a pixel-by-pixel comparison. So how did Netflix computer PSNR on a lower-resolution variant like the 720x480 file?
"Netflix and chill" meets the mile high club as Netflix subscribers gain the ability to stream from the SVOD leader's catalog at 35,000 feet.
Regular viewers of the SVOD average watching 10 shows and 4 movies on it each week; mobile Netflix viewing is also on the rise.
A report from Sandvine shows that Netflix alone now makes up a greater share of traffic than all audio and video did five years ago.
By recognizing that some titles are more visually demanding than others, Netflix has revolutionized the way it encodes video and will dramatically cut down bandwidth requirements.
Chelsea Handler, Will Arnett, Krysten Ritter, and Wagner Moura join Reed Hastings and Ted Sarandos to celebrate the SVOD's future
The algorithm wizards at Netflix pull back the curtain and show how they use worldwide data to improve local and personal recommendations.
The debut of Netflix streaming leads to rapid decreases in spending for physical media. The U.S. and U.K. have both seen DVD sales erode.
Video encoding professionals should take note of four papers presented at the recent International Symposium on Electronic Imaging. Read on for a detailed assessment.