Comparing Quality Metrics Up and Down the Encoding Ladder
When encoding an adaptive bitrate ladder, oftentimes you have to compare videos with different resolutions, which raises multiple issues. For example, when measuring peak signal-to-noise ratio (PSNR) or video multimethod assessment fusion (VMAF) to compare 640x360 video against an 854x480 video, what resolution do you compare them at? And how do you interpret the PSNR or VMAF scoring, and which metric is best? In this column, I’ll tackle all of these issues.
Regarding the first issue, there’s a theoretically correct answer, and then there’s how it’s generally done, and they don’t always correspond. The theoretically correct answer is to compare at the resolution at which the video will be viewed. For example, if you knew for certain that the video was going to be watched in a 480p window, you should scale the source and output files to 480p as needed and run your comparisons there. However, few publishers have that degree of certainty, so most scale the encoded files up to the resolution of the source video and compare there. This certainly makes sense for over-the-top (OTT) providers whose videos are almost always watched at full screen, and is a nice compromise position for other publishers.
Some programs handle this scaling behind the scenes; for most others, you have to scale in FFmpeg, which is a royal pain from a time and disk-space perspective. My one tip here is to convert your encoded files to the Y4M container format, rather than YUV, because the Y4M header contains resolution, frame rate, and format information that simplifies comparisons in your quality control tool. If you use the YUV container format, you’ll have to insert resolution, frame rate, or format data into your command line or input it into the program itself, which can be time-consuming.
The second question is how to interpret the scores once you have them. If you’re comparing cross-resolution files to the source, understand that scores will drop at lower resolutions because the smaller files contain more scaling artifacts and loss of detail. This means files encoded at the source resolution will have the highest scores, with lower resolutions scoring increasingly lower.
For example, in an article I wrote on per-title encoding, I compared technologies using an encoding ladder that started at 1080p and dropped to 180p. The typical PSNR scores were 45–50 dB for the 1080p rung, and dropped to around 30 dB for the lowest rung. That’s not a lot of range. The rule of thumb for PSNR is that quality above 45 dB is typically not perceivable by the viewer, while scores below 35 typically presage visible artifacts. But that’s only for the 1080p rung; the 180p rung will never get close to 45 dB, although the files might look good at 32 dB. So you can’t predict how a human would perceive a 360p file with a PSNR score of 38 dB, although when you’re comparing cross-resolution results, higher is always better.
What’s great about VMAF is that it was designed for this type of cross-resolution analysis. Specifically, a score of 100 is mapped to a 1080p file encoded at a constant rate factor (CRF) of 22, while a score of 20 is mapped to a file encoded at 240p at a CRF value of 28. In the same per-title analysis, typical 1080p scores were in the mid- to upper 90s, while the 180p files often scored in the single digits.
This range made VMAF scores much easier to interpret than PSNR, but you still can’t predict how a viewer will perceive the quality of a clip in the middle, say a 480p clip with a VMAF score of 42. However, you do know that six VMAF points equals one just-noticeable difference (JND). Technically, this means that 75 percent of viewers would notice a six-point swing, while closer to 90 percent would notice a 12-point, two-JND swing.
The ability to identify a JND is exceptionally useful to a range of encoding decisions, from configuring your encoding ladder to choosing an encoder or a codec. If you haven’t already started working with VMAF, it’s time to try it.
[This article appears in the October 2017 issue of Streaming Media Magazine as "Quality Metrics Up and Down the Encoding Ladder."]
Looking for insights into exactly how YouTube encodes billions of videos? Jan Ozer went down the rabbit hole and shares what he discovered about AV1, VP9, and resolutions.
A working group overseen by the CTA is creating recommendations for measuring performance quality, and some of the biggest names in the industry are participating.
If you're not using a video quality measurement tool, you're behind the curve. Here's a look at the most popular tools and how they work.
Save on bandwidth and keep costs down: Per-title video encoding solutions let publishers break free from fixed encoding ladders. Explore the benefits of four methods.
Netflix's compact mobile download files look surprisingly great. Here's how video creators make their own low-bitrate files look just as impressive.
The benefits of per-title optimization aren't just for the major players, anymore. Streaming Media reviews the first solution for smaller content owners and finds the results promising.
One-size-fits-all encoding doesn't produce the best possible results, so Netflix recently moved to per-title optimization. Learn why this improves video quality and saves on bandwidth, but isn't the right model for every company.