January 14, 2016
By Jan Ozer Contributing Editor
Featured Articles

How Netflix Pioneered Per-Title Video Encoding Optimization

Editor’s Note: This article was rewritten to incorporate answers posed in the original article and answered by David Ronca, Director, Encoding Technology at Netflix, and Anne Aaron, Manager, Video Algorithms at Netflix. The author would like to express his appreciation to David and Anne for sharing their expertise with Streaming Media readers.

The Netflix blog post entitled Per-Title Encode Optimization boldly declares that “to deliver the best quality video to our members, each title should receive a unique bitrate ladder, tailored to its specific complexity characteristics.” In a world where many companies simply deploy Apple’s recommendations from TN2224 without modification, it’s a breath of fresh air. The blog post goes on to detail how Netflix creates its per-title encoding ladders.

While the Netflix post provides some valuable universal truths, since Netflix is a subscription service there are some caveats that should be considered by companies that aren’t in similar businesses. After a quick overview, I’ll discuss these truths and caveats.

As you’ll read at the end of this post, on January 26, 2016, at 2:00 PM EST I’ll be hosting a webinar detailing lessons learned from the Netflix post, and describing a procedure companies can use to implement the content-aware encoding that Netflix so strongly advocates. You can read more about the webinar here.

Overview: Per-Title Optimization

The encoding world has long been dominated by one-size-fits all encoding “ladders,” or resolution/bitrate pairs. In its blog post, Netflix shared that it had previously used the following combinations to produce “good quality encodes” for most content.

netflix per-title table 1

Table 1. Netflix’s traditional one-size-fits-all bitrate ladder.

Netflix then described the problem with this approach, which is that for some challenging videos, “the highest 5800kbps stream would still exhibit blockiness.” At the other end of the spectrum, “for simple content like cartoons, 5800 kbps is far more than needed to produce excellent 1080p encodes. In addition, a customer whose network bandwidth is constrained to 1750 kbps might be able to watch the cartoon at HD resolution, instead of the SD resolution specified by the ladder above.” In short, each video has a unique complexity, and a single encoding ladder can’t optimize the efficiency or viewing experience for all viewers.

To represent this “very high diversity in signal characteristics” of the videos the Netflix encodes, the blog presented the following graph, which showed 100 files encoded using x264’s constant QP (quantization parameter), which encodes each file to a consistent quality. At a high level, QP encoding seeks to deliver a certain quality level, and varies the data rate to achieve this. Netflix is measuring the quality using the Peak Signal-to-Noise ratio, where higher scores indicate better quality.

Netflix Per-Title Fig1

Figure 1. A representation of the bitrate/PSNR of 100 Netflix titles.

To create the graph, Netflix encoded all files at four different QP levels, as you can see by the four points on the bottom lavender-colored line. Looking at that plot, plus the aqua line immediately above it, you can see that even though the QP encoding delivered a high data rate, the quality level, which was around 38 dB for both files, was comparatively low. This indicates that these files are challenging to encode.

At the other end of the spectrum, the aqua line pointing nearly vertical at the top of the graph topped out at over 48 dB at 2Mbps, despite using the same QP value as the two encodes at the bottom. That’s dramatically higher quality at less than 10 percent of the data rate, indicating that that the top aqua line represents an easy to encode file. As it relates to the compression ladder, these results prove that a one-size-fits-all solution either applies too high a data rate to the file on top of the graph or too low a data rate to the files on the bottom.

Okay, you get it; some files are hard to compress, some files are easy to compress, so you should encode them using different bitrate ladders. Before moving on, I wanted to tie PSNR scores to subjective ratings, which Netflix is obviously qualified to do. Specifically, for that hard-to-compress file at the bottom of the graph, a PSNR level of 38 dB is “acceptable.” At other points in the discussion, Netflix says that scores under 35 dB will show encoding artifacts, while scores above 45 dB produce no perceptible quality improvements. While I don't favor PSNR (as explained below), these are all useful data points for those who use the metric.

Those readers familiar with x.264 probably know that there’s an alternative to QP encoding called Constant Rate Factor (CRF) encoding, that adjusts quality to scene content. We asked Netflix if it had considered using CRF encoding to gauge the encoding complexity of the file, and Ronca responded, “We started with QP and recently migrated to CRF. The results are about the same.”

Speaking of x.264, it’s long been speculated that Netflix is using some kind of pre- or post-processing function to optimize quality, so we asked about this. Ronca responded that Netflix was using Plain x.264, and continued, “But the techniques we describe should apply to any codec. The point that gets lost sometimes is that our work is really pre-encode step to determine the best recipe for the encoder. In the past, an expert encodist would have made these decisions. We just got it to work at very large scale.”

Plotting the Convex Hull

After establishing that all files needed different encoding ladders, the blog post goes on to describe how Netflix produces the ladder. At a high level, Netflix runs a number of test encodes at different resolutions and QP values to plot the PSNR quality at each data rate/resolution pair, and uses that to identify the optimum encoding ladder.

One observation made in the post is that while increasing the data rate at the same resolution consistently increases stream quality, these quality increases flatten out once the bitrate goes above a certain threshold. You can see this for the low, mid, and high resolution plots in Figure 2. If you plot a line that includes the peak quality/bitrate efficiency points from all resolutions, you get a “convex hull,” a term describing the shape that most efficiently bounds all data points.

Netflix Per-Title Fig2

Figure 2. Plotting the convex hull, where each resolution or resolutions delivers maximum quality.

Here my grasp of the math and technique described in the post becomes strained. It seems obvious that for each resolution, the data rate selected would be the point on the convex hull. And Netflix is clear that it produces with a finite set of resolutions. What’s unclear is if each resolution gets a single encode, or if Netflix encodes at multiple data rates at the same resolutions.

This statement causes my confusion: “The bitrate selection is also limited to a finite set, where the adjacent bitrates have an increment of roughly 5%.” Does this mean that there are multiple encodes at bitrates roughly 5% apart, or if these are the bitrates for which Netflix tried to ascertain the highest quality resolution, in essence the test targets?

Note that this is a critical issue. The procedure detailed in the blog post focuses solely on optimizing quality, not on whether or not the encoding ladder performs well in the context of an adaptive group. In this regard, Apple Tech Note TN2224 advises producers to keep “adjacent bit rates at a factor of 1.5 to 2 apart.” A seminal Adobe white paper on the topic explains why: “Too many bit rates too close to one another could result in too many stream switches, even with smaller bandwidth fluctuations. Besides the slight overhead in switching, the viewer's experience with too-frequent quality fluctuations may not be pleasant.” So one big question is how many adaptive variants are produced for each source file, and how that changes with different content.

We asked Netflix how many bitrates were in the final group, and if they ware 5 percent apart, or more spaced out. Netflix provided the sample ladder shown in Table 2, and responded, “The total number of bitrates is dependent on the title. Ideally, there is one JND [just noticeable difference] between each bitrate. The sample below is the CBE for an animated original. This is representative but would vary per-title.”

Netflix Per-Title Table 2

Table 2: Old and new encoding ladder for animated footage. Note the significant data rate savings at 720x480 and above.

Another critical question is the encoding technique actually used for Netflix’s production encodes. Specifically, while Netflix clearly uses QP encoding as a tool to identify the optimal data rate target for each file, we were curious as to the technique used to encode the final video, particularly in view of the 10 percent variability threshold dictated in Apple Technote TN2224. Ronca advised that it uses two-pass VBR, with limited encoder parameters to maintain compatibility with legacy devices. As an example, Ronca related that the maximum buffer used was around 200 percent of the average target bitrate.

The Mystery of Netflix's Comparisons

Upon first reading, the biggest mystery was how Netflix computed PSNR on files with varying resolutions, since most objective quality tools like the Moscow University Video Quality Measurement Tool can’t perform cross-resolution testing of any kind (read the review). To be clear, the source file for all encodes is (presumably) a 1080p original. To compute PSNR, however, you need a pixel-by-pixel comparison. So how did Netflix computer PSNR on a lower-resolution variant like the 720x480 file?

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

How Netflix Pioneered Per-Title Video Encoding Optimization

Overview: Per-Title Optimization

The Mystery of Netflix's Comparisons

One Title at a Time: Comparing Per-Title Video Encoding Options

Comparing Quality Metrics Up and Down the Encoding Ladder

Conference Research Tests Adaptive Video and Quality Benchmarks

Netflix Leads to Decreased DVD Sales for Movies and TV Shows

Netflix Creates Local Recommendations Using Global Communities

CES '16: Netflix Launches in Over 130 Countries During Keynote

Netflix Re-Encoding Entire Catalog to Reduce File Sizes By 20%

70% of Traffic Video/Audio; Netflix Twice as Popular as YouTube

Netflix Use Skyrockets Among Regular Viewers, Finds GfK

Virgin to Offer Netflix Streaming on 10 Planes With Fast Wi-Fi

Best Practices: Localise It - AI Subbing and Dubbing

Best Practices: Sports and Esports Strategies That Matter Most

More

First Look: IBC Streaming Solutions

Analytics That Matter: Turning Viewer Data into Actionable Insights

More Web Events

Q&A: Tennis Channel SVP Direct-to-Consumer Matthew Graham Talks Launching, Programming, and Scaling the Tennis Channel App

QC’ing Live Streams at Scale in the Age of AI: A Q&A with Interra Systems' Anupama Anantharaman

Visualizing Verticalization at Scale: A Q&A with Ateliere's Flavius Goman on Ateliere Storyline

The Match Is No Longer the Beginning of the Journey

Checklist Report: Ultimate Guide to Maximizing the Value of your Content Library

More