Demuxed '18 Highlights: The Future of Codecs and Compression
The last presentation that I caught was "Fabio Sonnati’s 'Time Machine,' How to Reconstruct Perceptually, During Playback, Part of the Detail Lost in Encoding." By way of background, Sonnati is a pioneer of digital video encoding whose articles on per-title encoding and FFmpeg have provided a critical foundation for many practitioners, including myself. This was the first time I met Fabio in person and saw him speak.
In his talk, Sonnati explored whether it’s possible to reconstruct a portion of the quality lost during compression during playback. He started by identifying the classic encoding artifacts produced during encoding, which includes loss of fine detail and film grain, banding, and decrease of contrast and flatness. Given that we know these occur, Sonnati detailed how we might fix these during decompression (Figure 4), and showed several experiments that achieved meaningful increases in VMAF quality by deploying these techniques.
Figure 4. Fixing encoding related problems during decompression and display.
Operationally, Sonnati claimed that these enhancements could be performed in modern browsers using WebGL, including on mobile devices where they could provide the most profound benefit. However, performance tuning and logic considerations must be addressed before widespread deployment.
Demuxed Day 2
As mentioned, I only attended the first day of the conference, so all day two observations are from the archived videos, where I primarily focused on encoding-related presentations. The first talk I watched was Mux founder Jon Dahl’s presentation entitled "Video, Evolution, and Gravity: How Science Affects Digital Video." As the title suggests, Dahl explored how human physiology and physics have contributed to many of the fundamentals of video encoding and production, including aspect ratio, frame rates, and color management.
Among the many issues tackled, Dahl quantified why many videographers (including this author) detest vertically-oriented video with “Jon’s Law,” which postulates that “the appropriateness of a vertical orientation decreases exponentially with the amount of change.” This explains why still image portraits captured in portrait-mode look so great, while vertically-oriented snippets of sporting events look so awful. At the end, Dahl suggested that all video producers could benefit from learning the science behind human perception to best guide their creative and developmental efforts.
Briefly, low latency HLS works by advertising low-latency segments in the manifest files and then transferring them via chunked transfer encoding as described above in Law’s presentation. Servers then push segment chunks from the transcoder to the clients for playback. While this sounds simple enough, the required transcoder/server/client integration makes this a technology better implemented via a standard, and Bartos concludes by listing some of the companies involved in the efforts to create a low-latency HLS standard.
The next session I watched was by Comcast’s Alex Gilardi, entitled "The Virtues of Recycling in Multi-Rate Encoding." The high-level problem is that when producing an encoding ladder, most encoders perform some level of analysis for each layer, which is wasteful given that the source video is identical for all the layers.
By recycling, Galardi was referring to analysis information gathered during low-resolution encodes that could be used in higher-resolution encodes, which the flow suggested in Figure 5. Complicating this recycling is the fact that the lower resolution information has to be refined in some instances to apply to the higher resolution files. In his talk, Galardi discussed three different refinement alternatives, the fastest of which yielded a speed boost of 2.43x without a quality penalty when encoding high-resolution files using the HEVC codec. Note that while this approach decreases overall CPU cycles spent on the encoding, it will increase end-to-end latency as compared to a parallel encoder, since the lower-resolution files needed to be encoded before the higher-resolution files. This schema makes this approach impractical for live encoding.
Figure 5. Reusing analysis information to accelerate the production of an encoding ladder.
While Comcast proved this approach using HEVC, it should also work for other codecs like AV1 and VP9. On his last slide, Galardi included the FFmpeg script necessary to implement this approach, which will definitely simplify experimentation.
Deploying Subjective Video Quality Evaluations
The next presentation that caught my eye was from Intel’s Vasavee Vijayaraghavan, whose talk was entitled "Towards Measuring Perceptual Video Quality and Why." Vijayaraghavan started by describing objective metrics like SSIM and PSNR, which are automatable and therefore easy to use. However, she stated that these metrics often don’t accurately correlate with the human visual system, which limits their utility.
Conversely, subjective evaluations that produce a Mean Opinion Score (MOS) are time-consuming and expensive to produce but are the best predictor of human ratings. In 4K encoding tests performed at Intel, Vijayaraghavan found that scores above a MOS rating of 4.5 were imperceptible to viewers and recommended setting your bitrate to produce a maximum MOS rating of 4.5, or around 13 Mbps (Figure 6). As shown below, this still produced a very significant bandwidth savings over higher data rate encodes.
Figure 6. Intel found that MOS ratings above 4.5 produced no perceptible improvement.
In a production environment, Vijayaraghavan recommended implementing a per-category encoding scheme by choosing representative videos from the most commonly used content types, encoding at different video quality points, and measuring the MOS scores as above. Once you decide the appropriate maximum rate, you can create an appropriate encoding ladder and apply that to all videos in that category. She did warn, however, that this analysis must be performed separately for each content category and encoder/codec.
The final talk I watched was by Stephen Robertson from YouTube, who was supposed to talk about Machine learning for ABR in production. Apparently, however, machine learning at YouTube is not in production, so Robertson gave a peripatetic talk covering multiple topics, including the challenges of implementing machine learning at YouTube and some pretty interesting studies of video quality.
On a more practical level, he began his talk by sharing that YouTube was distributing about 1 GB of AV1 encoded video per second in mid-October, which he expected to increase to over 1 TB/second by the end of October. He did share that AV1 was not the most cost-effective approach, but that YouTube was deploying AV1 to show that they are “deadly serious” about the codec and “dedicated to its success.”
Overall, the diverse range of topics and speakers make Demuxed a valuable resource for all video producers. Again, I recommend that you scan the list of talks to see if there are any that apply to your practice.
Every compressionist fantasizes about discovering the perfect configuration option, but our columnist learns that some results are too good to be true.
This year's Demuxed event was jam-packed with 19 speakers covering topics including HDR, AV1, per-title encoding, low latency, optimization, and more.
Companies and Suppliers Mentioned