October 18, 2017
By Jan Ozer Contributing Editor
The Producer's View

Comparing Quality Metrics Up and Down the Encoding Ladder

When encoding an adaptive bitrate ladder, oftentimes you have to compare videos with different resolutions, which raises multiple issues. For example, when measuring peak signal-to-noise ratio (PSNR) or video multimethod assessment fusion (VMAF) to compare 640x360 video against an 854x480 video, what resolution do you compare them at? And how do you interpret the PSNR or VMAF scoring, and which metric is best? In this column, I’ll tackle all of these issues.

Regarding the first issue, there’s a theoretically correct answer, and then there’s how it’s generally done, and they don’t always correspond. The theoretically correct answer is to compare at the resolution at which the video will be viewed. For example, if you knew for certain that the video was going to be watched in a 480p window, you should scale the source and output files to 480p as needed and run your comparisons there. However, few publishers have that degree of certainty, so most scale the encoded files up to the resolution of the source video and compare there. This certainly makes sense for over-the-top (OTT) providers whose videos are almost always watched at full screen, and is a nice compromise position for other publishers.

Some programs handle this scaling behind the scenes; for most others, you have to scale in FFmpeg, which is a royal pain from a time and disk-space perspective. My one tip here is to convert your encoded files to the Y4M container format, rather than YUV, because the Y4M header contains resolution, frame rate, and format information that simplifies comparisons in your quality control tool. If you use the YUV container format, you’ll have to insert resolution, frame rate, or format data into your command line or input it into the program itself, which can be time-consuming.

The second question is how to interpret the scores once you have them. If you’re comparing cross-resolution files to the source, understand that scores will drop at lower resolutions because the smaller files contain more scaling artifacts and loss of detail. This means files encoded at the source resolution will have the highest scores, with lower resolutions scoring increasingly lower.

For example, in an article I wrote on per-title encoding, I compared technologies using an encoding ladder that started at 1080p and dropped to 180p. The typical PSNR scores were 45–50 dB for the 1080p rung, and dropped to around 30 dB for the lowest rung. That’s not a lot of range. The rule of thumb for PSNR is that quality above 45 dB is typically not perceivable by the viewer, while scores below 35 typically presage visible artifacts. But that’s only for the 1080p rung; the 180p rung will never get close to 45 dB, although the files might look good at 32 dB. So you can’t predict how a human would perceive a 360p file with a PSNR score of 38 dB, although when you’re comparing cross-resolution results, higher is always better.

What’s great about VMAF is that it was designed for this type of cross-resolution analysis. Specifically, a score of 100 is mapped to a 1080p file encoded at a constant rate factor (CRF) of 22, while a score of 20 is mapped to a file encoded at 240p at a CRF value of 28. In the same per-title analysis, typical 1080p scores were in the mid- to upper 90s, while the 180p files often scored in the single digits.

This range made VMAF scores much easier to interpret than PSNR, but you still can’t predict how a viewer will perceive the quality of a clip in the middle, say a 480p clip with a VMAF score of 42. However, you do know that six VMAF points equals one just-noticeable difference (JND). Technically, this means that 75 percent of viewers would notice a six-point swing, while closer to 90 percent would notice a 12-point, two-JND swing.

The ability to identify a JND is exceptionally useful to a range of encoding decisions, from configuring your encoding ladder to choosing an encoder or a codec. If you haven’t already started working with VMAF, it’s time to try it.

[This article appears in the October 2017 issue of Streaming Media Magazine as "Quality Metrics Up and Down the Encoding Ladder."]

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Comparing Quality Metrics Up and Down the Encoding Ladder

How YouTube Encodes Videos

QoE Working Group to Deliver Standards Document by End of Year

How to Choose and Use Objective Video Quality Benchmarks

One Title at a Time: Comparing Per-Title Video Encoding Options

Going Low: The Encoding Secrets for Small Sizes With High Quality

Review: Capella Systems Cambria FTC Offers Per-Title Encoding

How Netflix Pioneered Per-Title Video Encoding Optimization

Best Practices: Sports and Esports Strategies That Matter Most

Best Practices: Fine Tuning the Live Stream

More

NAB 2026: Spellbinding Streaming Solutions

Optimizing the Stream: Achieving Ultra-Low Latency Without Breaking the Budget

More Web Events

Vertical Leap: Growing the Free Vertical Drama Business at Streaming Media Connect

Sneak Preview: Vertical Leap: Growing the Free Vertical Drama Business at Streaming Media Connect

Fuse Media and Complex introduce Complex TV

Fox Sports to Deliver Vertical Coverage of FIFA World Cup with AWS Elemental Inference