April 1, 2016
By Jan Ozer Contributing Editor
Featured Articles

How to Use Objective Quality Measurement Tools

Every compressed file involves dozens of configuration-related decisions, including resolution, data rate, H.264 profile, VBR or CBR, entropy coding technique, x.264 preset, b-frames, reference frames—the list goes on and on. Most encoding professionals simply use configurations gleaned from presets supplied with their encoding tools, or perhaps from recipes found on the web. But how can you be sure that you’re squeezing the last bit of quality out of the selected data rate, or that your videos are optimally bandwidth-efficient? How can you tell how much additional quality a 1080p@ 7.5Mbps stream delivers over the 5.5Mbps stream?

Basically, you have three options: ignore the issue and hope for the best, implement time-consuming and expensive subjective testing, or use objective quality metrics, which are less expensive and consume less time, but still require investments of both money and effort. Over the past 18 months, I’ve adopted the last alternative. In this article, I’ll introduce you to two objective quality measurement tools, and describe how I use them to make better-informed compression-related decisions. But let’s start with a brief description of what objective quality benchmarks actually are.

What Are Quality Metrics?

Without question, the gold standard for assessing video quality is a controlled subjective test, which, as previously mentioned, can be time-consuming and expensive to run. Objective quality benchmarks are algorithms that compare the compressed video with the source and render a value that predicts how the compressed file would fare in subjective tests. There are multiple algorithms, all rated according to how well they correspond with actual subjective evaluations. None are perfect, but some perform better than others.

I use two tools to compute these scores: the Moscow University Visual Quality Comparison Tool (VQMT, $995) and the SSIMWave Video Quality-of-Experience Monitor (SQM, ~$2,400). Both run in GUI and batch mode, which is a lifesaver for most projects.

Briefly, VQMT is an algorithm-agnostic tool that lets you run more than 20 different quality algorithms, or versions of algorithms, including the familiar Peak Signal-to-Noise (PSNR) ratio, and Structural Similarity Index (SSIM). For various reasons, I’ve standardized on the VQM metric, where lower scores indicate superior quality. Still, the ability to compute PSNR and SSIM is often useful for clients or supervisors who are familiar with the metric and want to see the results.

From a usability standpoint, operation is simple in both batch and GUI modes. The GUI can process two files simultaneously (Figure 1), which is amazingly convenient when you’re comparing different encoding alternatives and want to view the differences in the actual frames, which the VQMT interface facilitates. The primary limitation is that you can only compare the quality of files at the same resolution as the source. This prevents analysis in the manner discussed below, where you’re trying to find the best resolution for a file at a given bitrate. Beyond this limitation, VQMT is very useful, and there’s a free trial version you can download that processes files up to, but not including, 720p in resolution. You can find information about the product and trial version, read my review of the product, and watch a short demo on YouTube.

VQMT can compare two files at once and presents this visualization that lets you scan through the tested file(s). Click Show frame to view the actual frame.

The SSIMWave SQM tool offers a different value proposition. Specifically, the tool is built around the company’s SSIMplus algorithm, which was coinvented by Zhou Wang, the company’s cofounder and co-inventor of the SSIM algorithm, which recently won an Emmy from the Television Academy. According to tests performed by company researchers, the newer SSIMplus algorithm provides the most accurate matching between SSIMplus scoring and actual subjective ratings of all tested algorithms, which included SSIM and VQM, the algorithm I use with VQMT. Today, the SQM tool is the only way to access the SSIMplus algorithm.

Unlike the VQMT, SQM ratings predict subjective evaluations, so a score of 80 to 100 predicts that live viewers will find the video excellent in quality; 60 to 80 predicts that viewers will rate the video good in quality, and so on down to zero. In contrast, the VQM rating can tell you which video has higher quality, but it doesn’t correlate to any level of viewer perception.

Beyond this, SQM offers two key features not available on the VQMT. First, you can select a device-specific profile and SQM will render a score that predicts how viewers watching on those devices will rate the video. This is important, because what looks good on a smartphone doesn’t necessarily look good on a 65" 4K TV set. Second, SQM can predict scores at resolutions different than the source resolution. This enables the second analysis presented below, where you want to find the optimal resolution for a specific bitrate file.

When I wrote my review of SQM, the product was very competent, but lacked the visualization tools VQMT provides. As shown in Figure 2, SSIMWave has added these, bringing the tool up to par with VQMT in this very important regard.

objective2

SQM’s new QoE Analyzer, a very useful visualization tool for SQM

How do I use the two tools? After months of working with both, I’ve found VQM to be a more sensitive canary in a coal mine than SQM, and better at identifying small differences between files. As you see in Table 3 on page 150, where VQM found a 6.8 percent difference between the 5Mbps and 6.5Mbps files, SQM found a 0.12 percent difference. Of course, sometimes the differences don’t add up to anything perceptible, as the SQM scores suggest, but since VQMT makes these differences very easy to spot, I still find it very convenient. Besides, sometimes lots of little differences add up to a big difference, and VQMT reveals the individual components of the big difference.

Of course, SQM provides a very useful counterpoint. If VQMT says the sky is falling, so to speak, and SQM says relax, I tend to relax. Moreover, SQM provides the multiple-resolution (and soon, multiple-frame rate) analysis, and device-specific profiles that VQMT doesn’t offer. I find both tools invaluable in their separate roles.

My Test Files

Let’s spend a couple of moments describing the test files. As you’ll see, different types of videos respond differently to various compression options. For this reason, if you’re working with different types of videos, you should create short test files and test each type. Here are the files that I tested in the examples below.

Tears of Steel—the Blender Foundation movie; mix of animation and live action video (mostly live action)
Sintel—Another Blender Foundation movie; all animation, but very lifelike rather than cartoonish
Big Buck Bunny—Yet another Blender Foundation movie; all animation, but more cartoonish than Sintel
“Screencam”—a screencam from the VQMT YouTube demo referenced above
“Tutorial”—a PowerPoint presentation with talking head video grabbed from a Udemy course on Multiple Screen Delivery
“Talking Head”—a simple talking head video of yours truly in my office
“Freedom”—Multicam concert footage (HDV/ AVCHD) of the fabulous Josiah Weaver at the Greensboro Coliseum
“Haunted”—footage from a trailer I shot with a DSL for the Haunted Graham Mansion

Let’s jump into our tests.

Custom Encoding or All Files the Same?

If you work with more than one file type, the first question you have to address is whether to encode them all using the same ABR group. This first test seems to indicate that the answer is probably not. To explain, for this test, I encoded the eight 720p test files in Handbrake using constant rate factor (CRF) encoding with a value of 19. Briefly, CRF encoding adjusts the data rate of the file to maintain a constant quality level. As you can see in the SQM column at the far right of Table 1, all of the videos range in quality from 95 to 99, which predicts that viewers would rate these videos as excellent. However, the screen-cam and tutorial videos achieved 99 percent quality level at 11 percent and 8 percent of the maximum data rate recorded in this test. In other words, you can encode these types of files at roughly 10 percent of the data rate of real-world video, and achieve the same quality level. Interestingly, with most encoders, once you choose a target data rate for these types of files, the encoder will deliver that rate, even though it could deliver the same quality at much lower data rates.

objectivet1

Table 1. Data rates required for specified CRF levels.

Note that Tears of Steel and Sintel were both produced and encoded at 24 frames per second. To compare their data rates to the other 30 fps files in the test, you’d have to add 20 percent to their data rates, which boosts their comparable data rates to around 4,800Kbps. This compares to 2,559Kbps for Big Buck Bunny, which was produced at 30 fps. The takeaway here is that simulated real world animations, such as Sintel, encode like live action videos, while more cartoonish animations, such as Big Buck Bunny, are a different class that might be able to support a much lower data rate and still achieve the same quality level.