Judging Apple's Advanced Video Quality Tool
Apple launched the Advanced Video Quality Tool (AVQT) at its Worldwide Developers' Conference 2021. Here's an overview of what I learned about it.
In a post on my Streaming Learning Center site, I point out that AVQT is both a quality metric, like VMAF and SSIMPLUS, and a tool to produce that score, like the Moscow State University Video Quality Measurement Tool (VQMT) or SSIMPLUS VOD Monitor. As a tool, AVQT is free, Mac-only, and command-line-driven. To produce the AVQT score, it compares the encoded file to the source and reports the score in either JSON or CSV format. You can't visualise the frames or score within the tool.
The tool has a number of features not typically seen in first-gen quality tools. For example, it can compare low-res videos to the source without pre-scaling. It can measure quality at different display resolutions, so you can measure the quality of a 640x360 video played in a 640x360 window, as well as fullscreen. You can also choose the viewing distance and the pooling method.
AVQT supports Dolby Vision 5 if the source and reference videos are in that format, and it is extremely fast. On an 8-core M1-based Mac mini, processing a 2-minute file took 15 seconds. On an 8-core Intel Xeon CPU E3-1505M Windows based-computer, computing VMAF on the same file took 8:40, about 35 times longer.
Strangely, however, the tool computes the score on a segment-by-segment basis, not on the file as a whole. You can set the segment size up to 60 seconds and get a single score, but if you're measuring files longer than 60 seconds, you'll get back a segment-by-segment
score that you'll have to convert to a total score if needed.
I asked Apple about this and got this response: “We agree there is value in reporting an overall score for the whole video. However,
this is quite challenging as it requires subjective data on long duration videos to design and evaluate an aggregation model. The aggregation model would need to mathematically model several memory related aspects in the human visual system such as first and last impressions, sudden quality drops and the length of low-quality periods.”
This “perfect is the enemy of the good” response ignores the fact that every other metric reports a single score and that most of the items referenced have more impact in a short file than a long file. Five bad seconds in a 10-
second file could destroy my overall opinion; 5 bad seconds in a 90-minute file is irrelevant.
In another post, I tested the metric and discovered that AVQT will fail if the encoded durations or frame rates vary from the source by even an irrelevant difference. In one test, in which the encoded file was 10.0767 seconds long and the source was 10.07, AVQT failed, even though both files had 300 frames. In contrast, FFmpeg, VQMT, and the SSIMPLUS VOD Monitor had no problems.
You can force AVQT to ignore the difference and produce a score, but you run the risk that the score is invalid, with no way to determine if there's a misalignment, as you can with VQMT or the SSIMPLUS VOD Monitor.
Many metric applications involve short, pristine YUV files encoded by precise reference encoders, so neither issue would be a limitation. If you work with longer, real-world videos encoded via commercial tools, AVQT might be a problem.
A final post compared the AVQT metric to VMAF and SSIMPLUS, with some subjective evaluations from Subjectify.us. Here, although AVQT showed some bright spots, I didn't test enough data to conclude anything other than that AVQT, VMAF, and SSIMPLUS gauge quality differently, so you can't use VQMT as a faster VMAF or a cheaper SSIMPLUS.
As a tool, assuming the duration and sync issues aren't showstoppers, AVQT's speed and JSON output make it ideal for production. For experimentation-oriented practices like mine, the inability to visualise frames is a significant limitation. As a metric, AVQT showed some bright spots, but it's hard to see it bumping VMAF or SSIMPLUS from real-world workflows without a lot more verification.
If you're serious about experimenting with different codecs and/or encoding parameters, MSU's Video Quality Measurement Tool is an essential tool, and version 13 brings some welcome improvements.
Suffice it to say that WebRTC is finally out of kindergarten and moving into the elementary grades. Which grade exactly? Well, that likely depends on which web browser you're using and which server technology or platform your WebRTC implementation uses.
Epiphan's George Herbert describes how Epiphan uses SRT (Secure Reliable Transport) and remote contribution encoders with multiple remote guests to upgrade streaming quality substantially over what Zoom provides in this clip from his presentation at Streaming Media Connect 2021.
Leveraging cloud-based video streaming platforms, service providers can deliver on-demand content at scale. Cloud-native platforms, in particular, help to optimize file storage and reduce video buffering to ensure high-quality streaming experiences.