Announcing the Swift Death of Per-Title Encoding, 2015-2019
The per-title encoding ladder as we know it is on the way out, and it will soon be supplanted by dynamic and context-aware encoding. An encoding ladder, of course, is the set of files created from a single live or video-on-demand (VOD) input to serve viewers watching on a variety of devices and connection speeds. Apple did the encoding world a huge service by formalizing the encoding ladder in Tech Note 2224 (TN2224), which became the Rosetta stone for encoding and packaging files for HTTP Live Streaming (HLS) and other adaptive bitrate formats. Many, many, producers simply used the TN2224 ladder as is or with modest modifications, and it worked just fine.
Per-title encoding was invented/first publicized by Netflix in December 2015. The theory was simple and compelling: All videos present different measures of motion and complexity, so it makes little sense to use a single ladder for all of them. Rather, it’s better to measure the complexity of the video and create a unique ladder for that content. Encode talking-head videos at low bitrates and save bandwidth costs, and encode higher-motion videos at higher bitrates to preserve quality for high-bandwidth viewers watching on big-screen TVs.
Within weeks of Netf lix’s announcement, YouTube revealed its own machine-learning-based per-title technology, and within 3 years, most major encoding vendors had launched their own version of per-title encoding. Suddenly, TN2224 disappeared, replaced with the HLS Authoring Specification and a new ladder, with Apple’s recommendation that “you evaluate [the bitrates in the ladder] against your specific content and encoding workf low then adjust accordingly.”
Of course, the best way to stay ahead of the curve is to make your own technologies obsolete, which Netflix did to per-title encoding with dynamic optimization, essentially a form of shot-based encoding. Rather than divide your video up into arbitrarily sized chunks, divide the content into scenes or camera switches that occur regularly in almost all kinds of footage. This might produce segments of varying durations, but as long as the segment and keyframe lengths are consistent between the ladder rungs, this won’t cause a problem.
Meanwhile, the benefits are many. Scene-based encoding automatically inserts a keyframe at each scene change, which promotes quality. You can adjust the bitrate for each scene as needed, and subtle differences will be less visible because they occur at a scene change. Most importantly, according to Netflix tests, dynamic optimization enabled bitrate savings of about 28% for x264, nearly 38% for VP9, and close to 34% for HEVC. Pretty impressive numbers, and at least for Netflix, per-title encoding was dead.
The most recent advancement is called (by Brightcove) context aware encoding (CAE). The concept is simple: QoE beacons and network logs provide details such as the effective bandwidth of your viewers, the devices they’re using to watch the videos, and the distribution of viewing over your encoding ladder. Your encoding ladder should consider this data as well as the complexity of the content.
Brightcove has deployed CAE for more than a year, while at the 2019 NAB Show, at least two companies—Epic Labs and Mux—revealed their versions of a similar idea. Epic Labs offers its version in a product called LightFlow and has at least one high-volume user, while Mux added Audience Adaptive Encoding to its encoding stack in April.
What’s interesting is that while per-title or even scene-based encoding can function on a standalone basis, CAE under any name needs data. As an OVP, Brightcove is an end-to-end solution provider and has access to the necessary data, as does Mux, which offers both encoding and QoE monitoring. Epic Labs can import data from third-party QoE vendors like Nice People at Work.
So when evaluating vendors, you have several additional questions to ask, like “Do you offer scene-based or just fixed segment-length encoding?” and “Can you integrate network and viewer data into the encoding ladder construction, and if so, what data sources do you support?”
[This article appears in the June 2019 issue of Streaming Media Magazine as "Per-Title Encoding Is Dead."]
In per-title encoding, an encoding service creates a unique encoding ladder for each video file. While available for live streaming, per-title encoding is most often utilized with video on demand.
Video encoding began as a one-dimensional data rate adjustment that reflected the simple reality that all videos encode differently is now a complex analysis that incorporates frame rate, resolution, color gamut, and dynamic ranges, as well as delivery network and device-related data, along with video quality metrics.
Last week, peer-to-peer video encoding company Livepeer announced $8 million in Series A funding. We ask the company's CTO for details on his company's unusual, possibly revolutionary, business model.
The encoding ladder served its purpose, but as streaming becomes more nuanced a ladder just doesn't provide enough options. It's nearly time for the video matrix.
Per-title encoding is on the way out as Brightcove and others demonstrate the value of a more holistic approach. Streaming Media's Jan Ozer interviews Brightcove's Yuriy Reznic at NAB 2019.
2018 was the year that context-aware encoding (CAE) went mainstream, but 2019 will be the year that context-aware encoding reaches critical mass.
A new generation of encoders looks at the context of content to deliver better video playback and higher efficiency. Here's what publishers need to know about CAE.