Defend Your Video Encoding Choices with Data
I have two very strong memories from my fraternity days in college. One relates to the night I learned how to play the drinking game Whales Tales, which I won't share. The second is of a frat meeting, which the chapter president started by saying, "The biggest problem facing our fraternity is apathy." The memorable response from one member: a loud, "Aw, who cares?"
That response came to mind this week when I read a tweet commenting on my blog post, "How to Reduce Encoding Time by Up to 40% With Negligible Quality Loss." The post shows that adding reference frames to an x264 encode could boost encoding time by 40% or more, with minimal quality improvement. In one dataset using the veryslow preset, jumping from one reference frame to 16 improved Video Multimethod Assessment Fusion (VMAF) quality from 95.97 to 96.06, but boosted encoding time by 37.12%.
For perspective, it takes about six VMAF points to equal a just noticeable difference, so even the most capable, eagle-eyed observer wouldn't notice 0.09 VMAF points. However, even a junior accountant would notice a 37% decrease in encoding costs.
I tested using 10 different files, all at least 2-minutes long, and with two different presets (medium and veryslow) and testing reference frame settings of 1, 3, 4, 8, 10, and 16. I customized data rates for each file to achieve a VMAF score of around 93–96, which is a reasonable target for the top rung of an encoding ladder.
The blog post incorporated about 3 days of testing, which involved around 400 encodes and VMAF calculations along with results visualizations using the Moscow State University Video Quality Measurement Tool (VQMT). Note that I did qualify my findings by stating that results with different H.264 codecs might vary, or potentially even different encoders using the x264 codec (I used FFmpeg). The point was, don't assume. Run a few tests with your setup, and if you achieve similar results, you might just save your company some serious coin.
What was the "frat boy" tweet in response to my post? A couple of days later, someone tweeted, "Number of reference frames between 4 and 8 should be optimal." So, his response to my data-driven article was, in essence, that his opinion mattered more than data.
What's the point? If you're charged with configuring your encoding ladder, there are a few datapoints you should have nailed. The most important is the VMAF score of the top rung of your encoding ladder. If it exceeds 96 on average, you should reduce the data rate of what typically is one of your most highly consumed files. In one consulting project, this analysis allowed a large OTT client to drop the data rate for talk shows and game shows from 8Mbps to 5Mbps without any customer noticing. Other content, like sports and action shows, needed the 8Mbps to achieve the necessary quality.
You should know the encoding time/quality trade-off of the encoding preset that you're using. In my x264 tests, the difference between the faster and veryslow preset averaged 1.1 VMAF points, while the higher-quality preset took four times longer. You should know the same trade-off for reference frames and other details.
The only practical way to do this is with video quality metrics like VMAF, SSIMPLUS, and some others. I know it's trendy to dismiss video quality metrics as imprecise, but that's making perfect the enemy of the good. Metrics are used by developers to benchmark codecs and run multiple per-title encoding engines, including Netflix's. Proprietary quality metrics are
the engine behind Beamr's Content-Adaptive Rate Control (CABR) and encoders from Brightcove, ATEME, and many others. With tools like VQMT and SSIMPLUS VOD Monitor, you can use metrics to identify trouble spots and verify the score with your own eyes. When delivered via effective visual tools, metrics aren't just a number, they're a road map to quality differences in your video files. Or, in the case of reference frames, the lack of meaningful quality differences.
Believe me, if your boss ever asks for data to back up your encoding configurations, the last things you'll want to say are "My opinion matters more than data" or, especially, "Aw, who cares?"
I produce most of my live training with a webcam because the quality delta between a webcam and other options is negligible when presented in postage-stamp-sized videos. But what setup should you use for a really important conference or a call that will be distributed to many viewers live or on demand? That's where things get interesting.
The game has changed for new codecs entering the market.
There's no longer any excuse for poor-quality audio or bad video on your Zoom meetings or webinars. By following just a few simple tips and spending less than $200, you can look like a pro.
Companies and Suppliers Mentioned