November 1, 2017
By Jan Ozer Contributing Editor
Featured Articles

One Title at a Time: Comparing Per-Title Video Encoding Options

Most streaming producers know that different video clips encode with more or less efficiency depending on the motion in the video, the level of detail, and other factors. Still, most producers use a fixed encoding ladder because it’s simple to implement and because they don’t feel that they have other options.

Well, whether you’re encoding in the cloud or on-premises, with a commercially available encoder or your own do-it-yourself FFmpeg-based encoding farm, you do have options. In this article, I’ll discuss four: One is for the DIY crowd, and the others are available as features of pre-packaged software programs.

DIY: Capped constant rate factor, or CRF
Pre-packaged: Capella Systems’ Source Adaptive Bitrate Ladder, or SABL
OVP or cloud encoder: Brightcove Context Aware Encoding, or CAE
Cloud service: FASTech.io Video Optimizer

To be clear, this article is not a head-to-head comparison of these products and services, but rather an exploration of their features and capabilities, and an attempt to create a structure to gauge their effectiveness. It will help you understand how these products and services function, and it will identify questions you can ask to tell them apart.

Rather than jumping right in, let’s start at the beginning, which in this case extends all the way back to December 2015.

Netflix Per-Title Encode Optimization

On Dec. 14, 2015, Netflix published a blog post entitled “Per-Title Encode Optimization.” It stated, “To deliver the best quality video to our members, each title should receive a unique bitrate ladder, tailored to its specific complexity characteristics.”

Netflix describes its approach, which involves multiple test encodes at different data rates and resolutions to identify the optimum quality at each data rate/resolution pair. Originally, Netflix used the peak signal-to-noise ratio metric (PSNR) to measure quality, but later changed to its own Video Multimethod Assessment Fusion (VMAF) approach.

Understanding the features and capabilities of the Netflix system provides a useful means for comparing later product and service offerings. This is shown in Table 1, which I’ll refer to in the individual product sections.

Table 1. Comparative feature sets

We’ve covered Netflix’s core schema in our own article. As a result of its VMAF analysis, Netflix adjusts the data rate of the clips in the individual encoding ladders, which all per-title solutions also do. Less common (and more valuable) is the ability to change the number of rungs in the encoding ladder and the resolution of those rungs. The benefits of these capabilities will become clear when we look at Brightcove’s technology.

Custom options provide the user with the ability to control the output to some degree. While I don’t know the options in the Netflix system, I’ll cover the options available for the other four in each section below.

The next point, bitrate control, is essential to understanding the difference between Capped CRF and the other techniques. That is, all other technologies use their per-title functionality to suggest the ideal data rate for each file in the encoding ladder. This frees the user to choose the bitrate control technique that he favors, whether constant bitrate (CBR) or constrained variable bitrate encoding (VBR). With capped CRF, the per-title technique is the bitrate control technique, which raises QoE concerns that I’ll cover when I get to that section.

The last feature is a post-encode quality check. With Netflix, objective quality metrics are critical to base system operation, with quality measurements dictating each configuration decision. While quality measurements aren’t as integral to the Brightcove per-title schema, you can set post-encode quality checks, ensuring that all output files meet a specified quality level.

So the features in Table 1 help you understand how each per-title encoding scheme works and what it does. Now let’s look at how to assess how well each technology performs.

Our Tests

To test each technology, I encoded 14 files with very mixed content, from screencam- and PowerPoint-based videos, to a range of animated movies, to a variety of real-world content, including low- and high-motion videos. As a baseline, I encoded each clip to the fixed encoding ladder shown in Table 2. Then I ran each technology’s per-title encode function to build another content-specific encoding ladder. As part of the per-title encoding run, I allowed each technology to increase the data rate as much as 50 percent for challenging files. Then I compared the original and per-title output files using both PSNR and VMAF metrics.

Table 2. The baseline encoding ladder

These comparisons were simple for capped CRF and Cambria, since both ladders contained the same number of streams at the same resolution. It was more complicated for Brightcove because in almost all cases, there were fewer output streams than inputs. As you’ll see, with Brightcove, I used data rate as the guide, comparing the CAE files to the files in the original ladder that had the same or lower data rate. In essence, this compares the fixed ladder/CAE ladder experience for users with the same effective bandwidth. More on this below.

Keeping Score

This analysis produced more than 1,200 data points, making it tough to figure out how to synthesize all this data into usable comparison points. Since it’s summertime, and since my editor Eric Schumacher-Rasmussen loves baseball, I decided to use the following schema.

Wins and Losses: For each file, a technology got a “win” when the per-title technology increased the data rate and quality without pushing the PSNR score beyond 45 dB, which wouldn’t be perceivable to most viewers. A technology also got a win when it reduced the data rate, but didn’t push PSNR values below 35, which could result in visible artifacts. Conversely, a technology got a “loss” when it violated either rule.

Errors: In order to function properly, encoding ladders need to have fixed distances between the various rungs. For example, Apple recommends that higher rungs be no more than 200 percent the data rate of the lower rung. So, each technology received an “error” when rungs exceeded 2.05x the data rate of the immediately lower rung, starting with rung 3 to exclude the two lowest bitrate rungs. Capped CRF and Capella also got an error when VMAF quality dropped by more than 6 points in one of the top four rungs, which according to Netflix would equal noticeable difference in quality. I excluded Brightcove from this measure because of the comparison issue described above.

Saves: A technology received a “save” when it eliminated a rung on the encoding ladder or an encoding pass, both of which reduced encoding costs.

Home runs: A technology scored a “home run” when it improved the PSNR value of the output files for any clip by greater than 1 percent in four or more ladder rungs.

Bandwidth saved: I also tracked the net bits per second in the output files saved by each technology.

I summarized the scores of the three technologies that I fully analyzed in Table 3.

Table 3. The comparison box score

Now, let’s dive into the individual technologies.

The DIY Option: Capped CRF

Constant rate factor is an encoding mode that adjusts the file data rate up or down to achieve a selected quality level rather than a specific data rate. CRF values range from 0 to 51, with lower numbers delivering higher quality scores. Multiple codecs support CRF, including x264, x265, and VP9.

On its own, CRF is unusable for adaptive bitrate streaming, where data rates in the ladder rungs need to be closely adhered to. However, by adding a “cap” to CRF, you limit the data rate to that figure. An FFmpeg argument implementing capped CRF would look like this:

ffmpeg -i inputfile -crf 23 -maxrate 6750k outputfile

This tells FFmpeg to encode at a quality level of 23, but to cap the data rate at 6750 (this was the 1080p stream). For low-motion clips, the CRF value would limit the data rate, as the required quality could be achieved at data rates lower than the cap. For hard-to-encode clips, the cap would kick in to control the data rate.

Looking back at Table 1, capped CRF can adjust the data rate, but not the number of rungs or their resolution. There’s also no independent bitrate control or post-encode quality check.

Looking at the box score in Table 3, capped CRF had 14 wins out of 14 completed trials. The eight errors all related to jumps from the 720p file to the 1080p file of greater than 2.05x. For the talking-head clip, for example, the 720p clip had a data rate of 1.04Mbps, the 1080p three times larger at 3.14. This would strand many viewers at the 720p clip, reducing overall QoE. If I was deploying capped CRF, I would try a lower quality value like CRF 25 for the 1080p file to limit that spread.

The key benefit of capped CRF is that it’s a single-pass technology; most other technologies are two-pass plus at least one analysis pass. This accounts for the 98 saves.

It wouldn’t be baseball without highlights and bloopers, and Figure 1 shows them both for Capped CRF. On the left is Capped CRF’s single home run, where a boost in data rate increased the PSNR values by greater than 1 percent in five of seven rungs. To explain, the Data Rate column compares the control and capped CRF output and shows the change. The PSNR Percent column tracks the change in PSNR percentage, while the PSNR dB column tracks the absolute change in PSNR. The VMAF column tracks the absolute difference in the VMAF score between the control and capped CRF output.