The State of CMAF: The Holy Grail or Just Another Format?
The holy grail of streaming is a single set of files that you can safely deliver to all target endpoints. The most likely candidate to help achieve this is the Common Media Application Format (CMAF). While still not capable of delivering the holy grail to all clients everywhere, CMAF has emerged as the DNA of an interoperability effort that will dramatically simplify compatibility between publishers and players. And ultimately, it may yet deliver the holy grail.
After a brief description of CMAF, I’ll jump into the things you need to know about it.
Briefly, Here’s CMAF
CMAF is standard for segmented media delivery formalized as ISO/IEC 23000-19. Specifically, CMAF uses the ISO Base Media File Format (ISOBMFF) container—with common encryption (CENC); support for H.264, HEVC, and other codecs; and Web Video Text Tracks Format (WebVTT) and IMSC-1 captioning. Unlike DASH and HTTP Live Streaming (HLS), CMAF isn’t a presentation format; it’s a container format that can contain one set of audio/ video files, with manifest files for multiple presentation formats and multiple DRMs.
The problem CMAF was designed to solve is shown in Figure 1, from a presentation produced by the Web Application Video Ecosystem (WAVE) project for the WAVE Boot Camp held in Los Angeles in October 2018 (much more on WAVE later in the article). To serve all the endpoints shown on the right of the figure, you need files available in four different formats: HLS, DASH, Smooth Streaming, and HTTP Dynamic Streaming (HDS).
CMAF replaces these four sets of files with a single set of audio/video MP4 files and four adaptive bitrate manifests.
According to the figure, this costs you four times the encoding/packaging expense and four times the storage on the origin and degrades the cacheability of your content. With CMAF, you have one set of audio/video files in a fragmented MP4 format with very lightweight manifest files for all four adaptive bitrate (ABR) formats. In theory, this cuts encoding and storage costs by 75% and makes your caching much more efficient.
Savings Are Overinflated for Most Publishers
These savings numbers are overinflated for most users since there are multiple techniques to achieve very similar savings. One prominent example is just-in-time (JIT) packagers that input a single set of MP4 files (live or video on demand) and package on-the-fly for the needs of each viewer. This means one set of MP4 files, not four, and no transcoding. Companies that use JIT packaging see CMAF as providing some efficiencies, but certainly not a savings of 75% on encoding and storage.
For example, I spoke with Eran Kornblau, chief architect at Kaltura, who stated, “We have our own just-in-time packager, which is very efficient. It inputs MP4 streams and outputs all necessary protocols and provides great flexibility because we don’t have to encode and package in advance.”
I asked Kornblau about the cost aspect of this since JIT packaging requires a server running at all times. He responded, “Our packager is very efficient, so packaging to CMAF in advance wouldn’t save significant resources over JIT packaging.” I received a similar answer from Jerome Blanc, EVP of compression products at Anevia, which also deploys JIT packaging. He stated, “We’ve optimized our packaging and encryption engine so it doesn’t cost a lot; maybe we could trim CPU costs by 10% or so by delivering static content as opposed to JIT.”
JIT isn’t the only way to serve multiple formats with a single data store. According to Yueshi Shen, principal research engineer and engineering manager at Twitch, CMAF is of little short-term interest for Twitch because it can reach all of its relevant targets with HLS. “For target platforms that don’t support HLS,” Shen explains, “our player can transmux to DASH in real time.” Of course, Twitch targets primarily computers and mobile devices, where HLS support is pervasive, as opposed to smart TVs, where DASH is prevalent, and it may be complicated to add transmuxing within that environment.
To be clear, CMAF does deliver some storage and caching efficiencies over JIT packaging, although the extent depends on your distribution architecture and whether you package at the origin server or at the edge. Looking at the early CMAF adopters in the Akamai ecosystem, Will Law, the company’s chief architect of media cloud engineering, shares that “from our side, the largest benefit we see is improved cache efficiency in the case where an HLS/TS multi-DRM implementation is replaced by a single-silo HLS/CMAF implementation.” Still, for most producers, CMAF doesn’t deliver the 4x encoding/storage savings suggested in Figure 1.
What About Protected Content?
Probably the most significant bar to deploying CMAF with DRM relates to two incompatible encryption modes available in CMAF. As explained by Pieter-Jan Speelmans, CTO of THEO Technologies, “There are two encryption modes used: CBC (sometimes noted as CBCS), which is used mainly by Apple for FairPlay DRM, and CTR, which is used for Widevine and PlayReady. As Apple did not want to add support for CTR, Google and Microsoft have added CBC support to their DRM systems.
“However, for certain DRM levels, you will need hardware support for these encryption modes. Older devices that don’t have support for the CBC mode will not be able to support hardware-grade DRM. Similarly, while Content Decryption Models (CDMs) are being updated to support CBC, your devices will need to get this update before they can play back content using this encryption. The number of platforms not able to support this likely spans older OTT devices (like smart TVs and set-top boxes, etc.) and some not-updated mobile devices. Software DRM could be shipped in-app to circumvent this, but it would not be hardware-grade DRM of course.”
Elaborating on the problem at the NAB 2019 presentation “Best Practices for Deploying CMAF, DASH and HLS at Scale,” David McLary, VP of video technology at NBC Sports Digital, said, “Everybody we’ve talked to has said that CBCS support is coming in the next 12– 18 months [for new devices and updates]. But you’re always going to have the problem of older devices. Devices that only support CTR and not CBCS aren’t going away, and I don’t know if they’re going to get updated. That’s something that we’re going to have to consider as we try and support the older devices as we get further down the road.”
It’s not just hardware endpoints; it’s also some browser versions. For example, David Eisenbacher, CEO of EZDRM, notes that “Microsoft Edge and Internet Explorer can’t currently play certain types of PlayReady-protected video encrypted with CBC. This should be fixed for Edge when Microsoft releases its new version based on Chromium, but likely [will] never be resolved for Internet Explorer.”
For a summary of this device-support issue, check out Phil Harrison’s excellent LinkedIn article, “It’s About CBCS’ing Time.”
When First Deployed, CMAF Will Be ‘Yet Another Format’
While the CMAF “promise” is one set of files for all endpoints, most initial implementations will be CMAF in addition to various flavors of DASH or HLS to enable support for legacy devices. As McLary commented in his NAB talk, “There will be a period of time where we deploy HLS and CMAF together. It’s not going to be a switch that one day we flip. So it’s this in-between phase [when it’s] going to be complicated to figure out what we do.”
In their must-read white paper, “Towards Mass Deployment of CMAF,” four authors from Brightcove, including Yuriy Reznik, outline their vision for deploying CMAF within the Brightcove Video Cloud platform, an overview of which is shown in Figure 2. As the name suggests, Video Cloud is a cloud-based system that includes multiple components, such as Context Aware Encoding, and a dynamic delivery system that manages transmuxing, packaging, encryption, and delivering content to CDNs.
The Brightcove Video Cloud platform
From a positive perspective, the Brightcove authors reveal that adding CMAF to their ecosystem was straightforward, stating that “adding CMAF to a system that already supports dynamic transmuxing to several existing delivery formats is relatively simple, and boils down to a few elements: more restrictive profile generation and encoding, adding an extra flavor of ISOBMFF transmuxer, and adding extra rules to HLS and DASH manifest generators to produce CMAF-compatible manifests.”
However, they also indicate that CMAF will be additive as a format: “While in the short term CMAF most likely will have to co-exist with other varieties of HLS, DASH, and some other delivery formats, the more devices will become capable of decoding it, the clearer benefits we will start to see. Even with dynamic transmux and delivery, the use of CDNs still remains suboptimal, with multiple versions of [the] same content competing for CDN cache at the edge.” In short, from a fragmentation perspective, this means CMAF will make things worse before it makes them better.
When does it make sense to add CMAF to your existing formats? At the SF Video Technology Meetup in May 2019, Brightcove’s Reznik gave a fascinating presentation titled, “On CMAF: Can Deploying a 3rd Streaming Format Reduce Costs?” Here, he first modeled which data gets cached on a CDN, making the obvious point that the most popular data, or the data that’s retrieved by the most popular players, has the highest probability of being cached.
Interestingly, this point debunks the concept that delivering four formats quadruples the expense associated with caching the data on the edge. That is, if you deliver Smooth Streaming to 1% of your viewers, HDS to 1% of your viewers, DASH to 5%, and HLS to 93%, your cache storage costs don’t quadruple— they likely remain at 1x since only the HLS will get cached. Certainly, there are other costs and potentially lower quality of service for the noncached formats, but the pure storage cost doesn’t quadruple.
Of course, this same concept works in your favor as CMAF gets more popular. As shown in Figure 3, once the percentage of CMAF-capable players exceeds 84%, CDN costs should break even. As we saw before, other costs associated with the other formats will increase, and QoE to those devices will decrease because the data isn’t cached at the edge.
Once 84% of your endpoints can play from the CMAF files, CDN costs start to drop.
The fact that CMAF will be additive shouldn’t come as a surprise. “I still believe that we will have to handle a fragmented world with multiple codecs, multiple delivery formats, and a huge range of devices,” says Magnus Svensson, media solution consultant at Sweden’s Eyevinn Technology. “Lessons learned from deployments that I’ve been part of [are] that as long as you want to support many different devices, especially smart TVs, you need multiple workflows.”
How long will you need to continue distributing multiple formats? That varies from publisher to publisher. But the obvious point is that it makes sense to continue legacy support as long as the revenue, in whatever form, exceeds the cost. What does that mean in terms of years?
Well, don’t hold your breath. According to MediaKind’s Tony Jones, “The main issue is that, until the usage is near ubiquitous, CMAF creates the challenge of an additional format to deliver. The end state is, of course, a real benefit through commonality, but it would appear likely that it will take several years before other formats can be retired.”
Want a hard number? According to Sean McCarthy, product marketing manager, and Richard Fliam, solutions architect, both at Bitmovin, “Many new devices work fine with CENC and a standard encryption algorithm, but legacy devices require more specific, varying formats. For this reason, CMAF has not yet delivered on the cost-reduction benefits for CDNs, but as customers phase out legacy device support in the next 5+ years, this should be an added benefit to the economics of the streaming workflow.”
Implementation Complexity Will Vary
However desirable or functional, most OTT shops can’t switch over to a new format until they can protect it, monitor it, monetize it, and make it play on their full range of target devices, not just for current content but for legacy content. You’ve already seen how DRM complicates single format delivery; there are several other areas to consider.
Here are two strong reasons to attend Streaming Media West's pre-conference training sessions: a morning seminar on deploying CMAF and an afternoon session on video machine learning.
Microsoft's David Sayed, Imagine Communications' David Heppe, and Akamai's Will Law discuss CMAF and the future of packaging formats in this clip from their panel at Streaming Media East 2019.
RealEyes Media Development Manager John Gainfort discusses CDN scaling, chunked encoding, and their future impact on latency in this clip from his Video Engineering Summit presentation at Streaming Media East 2019.
Reducing latency for HTTP Adaptive Streaming video to 3 seconds or less is possible, but it requires a complex workflow.
Companies and Suppliers Mentioned