How to: Video Quality Optimization
One of the most important aspects of streaming delivery is quality. Every content publisher wants their content delivered to hundreds of thousands or even millions of end users at the exact same quality, with no noticeable delays, latency, artifacting, or incompatibility issues.
Measuring quality, however, takes on many different forms. As you’ll see throughout the pages of this year’s Streaming Media Industry Sourcebook, quality discussions range from measurement and optimization of the service infrastructure—whether it’s straight-up quality of service (QoS) measurements or best practices around network design that might add up to more effective QoS—to end-user focused (and measured) quality of experience (QoE).
Other quality factors come into play along the way, from energy efficiency in routing and data centers to packet classification for prioritization of time-sensitive content.
Finally, there are the areas of quality that are addressed as the very beginning of the supply chain: quality in acquisition at either the encoding or transcoding points of a streaming workflow. In recent years, there has been a fundamental shift in the way that images (from single to sequences) are being optimized to deliver better visual quality. At the time of encoding, there are ways to optimize overall quality of the video itself, and this how-to guide attempts to provide a basic primer on how image optimization is enhanced, how it is perceived, and even how we should consider measuring it.
Decide What to Measure Before Measuring
When you read about quality measurements, almost all sales literature pitches particular tools—or groups of tools—to mathematically measure quality. These range from PSNR and SSIM to derivatives of these two basic measurements, yet there are two major shortcomings in this approach: Neither measures what a human considers important, and both can be tricked by known encoding anomalies. Calling them anomalies, though, probably isn’t fair, since these encoding problems pop up time and time again.
The errors in encoding are visible to a human, but to the quality algorithm, the errors are invisible.
The most costly way to assure quality, after content has been encoded, would be to pay video quality experts—“golden eyes”—to view every single bit of content and catalog errors or sub-standard encoding points in the encoded video file. After all, if a human’s visual system is the standard by which quality should be measured, then using the gold standard of the golden eyes should solve the problem, right?
This not exactly true. For one thing, it turns out that golden eyes are about as accurate as average Joes or Janes when it comes to spotting quality issues. Second, there’s so much premium content being generated, in so many formats, bitrates, and resolutions, that there probably aren’t enough human testers to go around.
This question naturally arises: Why don’t we build quality systems that mimic the human visual system (HVS) instead of requiring humans to watch all the videos for quality control?
That is an admirable goal, and it’s one that the industry needs to shoot for, understanding that there will most likely always be perceived differences in quality between the high bitrate master (mezzanine) file and any files that are encoded from the mezzanine file.
The biggest issue in building an HVS-based automated measurement solution is agreeing on which parts of human vision are essential versus which parts are (no pun intended) peripheral.
Optimizing Libraries or Titles?
In 2015, two major companies announced fairly disparate approaches to improving and measuring quality optimization.
The first was Netflix, which planted a stake in the ground by declaring that all encoding should be considered on a per-title basis. The assumption here is that every episode in a television series would be lensed, edited, and lit the same way, and that there was some consistency in the genre and types of shots that all episodes in a title would adhere to (e.g., handheld, low light, with lots of action versus locked-down camera in bright light with limited actor movement). The problem with this, of course, is that we all know it grossly oversimplifies the encoding process to assume that all episodes in a title are consistent. Heck, all one has to do is watch two episodes of Netflix’s own blockbuster series The Crown to know that odes to multiple genres exist within a single episode.
The other thing that Netflix did at the time of its per-title announcement, however, was to announce in early 2016 the advent of a new perceptual quality metric, video multimethod assessment fusion (VMAF). A few months later, Netflix expanded on VMAF in a blog post titled, “Toward a Practical Perceptual Video Quality Metric.” The blog highlighted the cloud-based media pipeline Netflix uses to encode its per-title optimization approach. It also noted Netflix’s newest approach to compression: its work on the Alliance for Open Media (AOM) codec that’s a derivative of both the Mozilla Daala and Google VP10 codec reimagined as the AV1 codec. Alas, as of this writing, AV1 is dealing with major limitations in both ringing (halos around particular objects in a scene) and perceptual quality approaches that require almost 20x additional processing time to make any visibly discernible gains. Chinese consumer product giant Huawei also announced its intent for HVS-based measurements in 2015, offering up what it called the User video Mean Opinion Score (U-vMOS). The MOS approach is a method of measuring quality on a scale (e.g., 1–5 or 1–10) and then calculating the mean across multiple users to determine which rating (opinion score) to assign to a particular video.
Today's market is too competitive for subpar experiences. If companies aren't monitoring quality of service and quality of experience, they're likely losing viewers—and profits.
EuclidIQ's Frank Capria and Streaming Media's Tim Siglin discuss Euclid's new OptiiQ.ly and UptiiQ solutions and improving the viewer experience for online and OTT video in this interview from Streaming Media West 2016.
The sheer number of video quality measurement tools makes it difficult to choose the right metric. Here's a quick overview of some of the options and what they offer.
Mux makes it simple for media companies to learn exactly what problems their viewers are experiencing, and then find solutions.
Netflix announced the open-source availability of the Video Multimethod Assessment Fusion, which it's now using instead of PSNR to analyze the quality of transcodes in its vast catalog