Demuxed '18 Highlights: The Future of Codecs and Compression
Demuxed is the annual conference for video engineers by video engineers. Held on October 17 and 18 in San Francisco, the conference included 31 speakers giving talks in rapid fire fashion ranging in duration from 10 to 30 minutes. To use a well-worn but appropriate metaphor, the experience is like drinking from a fire hose: almost impossible to comprehend and digest in real time. I attended the first day this year, and watched several talks from the second day via the video library on Twitch.tv.
Overall, the videos are an invaluable source of information on a wide variety of topics. In this story, I’ll review some of the talks that I found most interesting, which obviously will be unique to me. Before glancing through my list below, I strongly suggest that you review a complete list of speakers and topics. You’ll likely find multiple talks that you’ll want to watch in addition to those that I discuss below, listed in order of presentation.
Demuxed Day 1
The conference started with a bang thanks to a talk entitled "Streaming the 2018 FIFA World Cup Live in UHD with HDR" by Fubo.TV’s Billy Romero and Thomas Symborski. The object of the exercise was delivering a four-rung HEVC encoding ladder ranging from 2160p at 16 Mbps to 720p at 3.5 Mbps, all with HDR10 HDR metadata (Figure 1). The entire workflow was cloud-based and involved transcoding the 70 Mbps input feed in the cloud on AWS C5.18 X-Large instances.
Figure 1. Instance details and the encoding ladder used for the FIFA World Cup 4K delivery with HDR10.
During the talk, the presenters delivered a blueprint for any video engineer seeking to produce a similar experience, covering the network setup for acquisition, encoder selection, packaging and storage, and client and player considerations, including lessons learned delivering to Amazon Fire TV/Android TV, Roku, Chromecast Ultra, and Apple TV devices using ExoPlayer, AVF, Roku, Shaka, and the Bitmovin players. Ultimately, the presenters advised attendees to “fail fast, learn quickly, and focus on the user experience.”
The next talk in my wheelhouse was "What to Do After Per-Title Encoding" by Ben Dodson and Nick Chadwick from Mux. During the fast-paced talk, Dodson and Chadwick reviewed the history of per-title encoding and many of the foundational theories and challenges. Then the pair detailed how Mux built its own per-title encoding facility using machine learning, and how their per-title encoding extended to per-scene encoding, which enabled live per-title encoding. This is a dense and technically challenging presentation that anyone designing a per-title or per-scene encoder will find invaluable.
Perceived video quality is the very heart of what we do, and Twitter’s Sebastiaan Van Leuven’s talk "Subjective Video Quality Assessment for Mobile Devices" tackled this subject head on. During his ten-minute talk, Van Leuven first reviewed two commonly used techniques for measuring video quality, single-stimulus and double-stimulus Mean Opinion Score (MOS). Briefly, single-stimulus shows a single sample and asks for a rating on a five-point scale, while double-stimulus shows the original video and then the encoded sample, and requests a similar rating. While simple to deploy, both testing methodologies score low on precision and consistency, both with different testers rating the same video and the same tester rating the same video on a different day.
To improve consistency and reliability, Twitter developed an Adaptive Paired Comparison (APC) that shows two samples and asks the subject which is better, like the optometrist asking, “Which looks better, left or right?” This test methodology produces more accurate and reproducible results but can also be very time-consuming. What’s novel about Twitter’s approach is the active learning procedure using a particle filtering simulation that streamlines the sample selection. The short presentation provides an overview, which Van Leuven supplemented with a link to a blog post.
Accelerating AV1 Playback with dav1d
The Alliance for Open Media’s (AOM) AV1 codec launched in mid-2018 but hardware-accelerated playback isn’t expected until mid-2020. This makes software decoder efficiency absolutely critical for deployments over the next 24 month. Many initial tests of AV1 decoding using the AOM decoder libaom, including my own, showed it to be slow and inefficient. For this reason, AOM sponsored the development of a new open-source AV1 decoder called dav1d by the VideoLAN, VLC, and FFmpeg communities.
In their talk entitled Introducing dav1d, "A New AV1 Decoder," VideoLAN’s Jean-Baptiste Kempf and Two Oriole’s Ronald Bultje described the goals of the project, which include smaller source code, smaller binary executable, and a smaller runtime memory footprint than libaom. During the talk, Bultje reviewed dav1d’s performance to date, and predicted that when fully implemented, it will produce similar decode performance to H.264, HEVC, and VP9. While this won’t match the decode efficiency of codecs supported in hardware, it will certainly extend AV1’s usage far beyond where libaom could take it. According to this blog post, dav1d currently works on x86, x64, ARMv7, ARMv8 hardware and runs on Windows, Linux, macOS, Android, and iOS.
As RealEye Media’s David Hassoun pointed out in his presentation, "Multi-CDN Jump Start, Don’t Put All Your Bits in One Basket," using a single CDN to deliver your traffic means a single point of failure, an unacceptable risk whenever streaming delivery is mission critical. As Hassoun also mentions, a single CDN may also not provide the best experience for many users, and may not be cost effective.
These points made, Hassoun then identified common problems of using multiple CDNs, such as synchronized origins for live streaming, traffic routing, receiving actionable real-time-ish data for QoS and QoE, and cross CDN access security. Then, he suggests multiple solutions to these problems and how to build multiple CDN support all the way down to manifest file creation. Covering a lot of ground in the allotted ten minutes, this presentation is a must see for anyone considering dipping their toes into multiple CDN delivery (Figure 2).
Figure 2. Adding multiple CDN support to a dynamic master playlist.
Reducing Glass-to-Glass Latency
Glass-to-glass latency is a consistent concern of many live-event producers. While there are several proprietary approaches to reducing live latency, like Wowza Streaming Cloud’s ultra-low latency service, this may not work at the scale required for large events.
One solution that’s gaining traction is Chunked CMAF as comprehensively described by Akamai’s Will Law in his presentation entitled "Chunky Monkey, Using Chunked-Encoded Chunked-Transferred CMAF to Bring Low Latency Live to Very Large Scale Audiences." Figure 3 illustrates this approach. On top is the traditional way to deliver a segment, which is waiting until it’s completely finalized and stored off. The bottom shows the same media samples packaged in chunks that can be delivered before the complete segment is encoded and saved, which dramatically reduces latency.
Figure 3. On top is a single segment delivered after completion. On the bottom are the same samples packaged in chunks delivered chunk by chunk.
Though this approach cuts latency and streamlines network throughput, it also causes multiple issues, like how to estimate bandwidth and how to resolve timing differences between HLS and DASH. Law discussed different solutions to these issues and concluded with a look at standardization efforts for chunked CMAF, as well as commercial vendors and open source tools for implementing this approach.
This year's Demuxed event was jam-packed with 19 speakers covering topics including HDR, AV1, per-title encoding, low latency, optimization, and more.
Companies and Suppliers Mentioned