NAB 19 Encoding and QoE Highlights: Here's What Mattered
Every year I go to NAB with no real product-related expectations. My schedule is more haphazard than strategically organized, packed with companies and friends I want to see and other companies with aggressive marketing folks who happened to call at the right time. Plus the random company that catches my eye on the show floor while I have a few minutes. So, this is “what did Jan see at NAB” as opposed to “all the important streaming-related advancements” available at the show.
Everyone who knows what an encoding ladder is should read the first section; thereafter, I’ve clumped the technologies into sections so you can quickly skim and see which might be relevant to you. I hope you find this useful.
The Next Big Thang
On December 14, 2015, Netflix obsoleted the fixed encoding ladder by introducing per-title encoding, which creates a specific ladder for each video according to its unique content. At NAB 2019, per-title encoding was obsoleted by the concept of context-aware encoding, which incorporates the video’s unique content as well as the network and device usage statistics to create a ladder that optimizes quality for each video according to how your viewers are actually watching it. Context-aware encoding can cut encoding, storage, and bandwidth costs and improve viewer quality of experience.
The concept is simple: QoE beacons and network logs provide details such as the effective bandwidth of your viewers, the devices they’re using to watch the videos, and the distribution of viewing over your encoding ladder. Your encoding ladder should consider this data as well as content.
One of the first to productize this concept is Brightcove with its Context-Aware Encoding (CAE). In a white paper entitled Optimizing Mass-Scale Multi-Screen Video Delivery, the Brightcove authors created an encoding ladder for the same video for three different operators, the first with primarily mobile viewers, the second with primarily PC and TV viewers, and the third with exclusively high-bandwidth viewers watching on TVs. Figure 1 shows the encoding ladders created for each operator, again for the same content.
Figure 1. Brightcove’s Context Aware Encoding (CAE) produces three different ladders for the same content based upon device and networks statistics.
The paper compares these ladders to the recommended HLS encoding ladder in the Apple HLS Authoring specs and notes how the three custom ladders reduce the number of renditions and overall storage, as well as reducing bandwidth and/or improving the quality of experience. It presents a compelling picture for the value that CAE provides.
Brightcove has deployed CAE for over a year, while at NAB, at least two companies—Epic Labs and Mux—revealed their versions of a similar idea. Epic Labs offers its version in a product called LightFlow and has at least one high-volume user, while Mux added Audience Adaptive Encoding to its encoding stack in April.
Figure 2. Bandwidth and device data fed back into Epic Labs' LightFlow system.
For more on the Brightcove and Epic Labs products, you can watch my interview with Epic Labs's founder and CEO Alfonso Peletier.
What can you do today with your existing encoding tools and workflow? Besides signing on with these companies, you can take a hard look at the device resolutions most used by your viewers and the rungs of the encoding ladder typically consumed, then make some common sense adjustments. Many, many organizations primarily serving North America, Europe, and countries in the Far East would find that the majority of their viewers are watching the top two or three streams and that they can cut one or two rungs from the bottom of their ladders without any degradation of QoE. Conversely, those serving primarily lower bitrate viewers would be best served by concentrating the bulk of their rungs for viewers in this range. A quick glance at Figure 1 can provide guidance for these efforts.
From my perspective, under any name, incorporating network and device data into the encoding ladder formulation was the most important technology that I saw at NAB. But there were many other bright spots. Here’s a quick synopsis.
What Twitch Is Up To
Twitch.tv is a progressive and fabulously successful operator that has been more than gracious in sharing its expertise with the Streaming Media audience by contributing to our articles and speaking at our conferences. One of Twitch’s shining stars is Yueshi Shen, Principal (Level 7) Research Engineer & Engineering Manager. In an interview, Yueshi shared Twitch’s plans for adopting VP9 and AV1, and why constant bitrate encoding is absolutely necessary when streaming to hundreds of thousands of live viewers.
On the VOD Encoding Front
I started my NAB with Telestream discussing the Vantage Cloud port with marketing director Ken Haren. At a high level, the port enables Vantage customers to deploy many workflows previously available only on-premises up to the cloud, a key benefit as more companies store mezzanine assets in the cloud.
At my next stop, Capella Systems, I discussed the key new features of Cambria FTC, the company’s flagship VOD encoder. Like Telestream, Capella added cloud functionality to Cambria, specifically the ability to spin up AWS instances running Cambria and distribute encoding to those instances as you would any other instance of Cambria running on a on-premises encoding farm. Capella has created the AMI preset which users initiate and control within Cambria. It’s ideal for pushing overflow encoding requirements to the cloud while encoding the normal load most economically to existing systems.
The other new feature is the ability to import elementary streams and package them in DASH or HLS formats. This allows producers to encode their content once and package them as needed for additional distribution targets. This two-step encode-then-package workflow, is much more efficient than encoding from scratch for each output format.
On Wednesday, I interviewed Encoding.com founder and CEO Greggory Heil. In addition to discussing the recently released Global Format Report, Heil highlighted new features like encoding and decoding Apple ProRes, the ability to output Dolby Vision, and Ludicrous HLS, which splits source files into smaller segments to accelerate encoding. As an interesting counterpoint to the per-context discussion above, after the interview I asked Heil why Encoding.com didn’t yet offer per-title encoding. He responded that while per-title was certainly on their roadmap, it wasn’t a highly requested feature among their users.
I briefly met with AWS Elemental to talk about Accelerated Encoding for AWS MediaConvert which, like Encoding.com’s Ludicrous HLS, segments videos into chunks fed to multiple encoders to accelerate encoding by up to 25x.
I also spent time with Murad Mordukhay, CEO of cloud-encoder Qencode, which offers a form of per-title encoding and extremely aggressive pricing, like $0.004 per-minute for SD encoding. The service supports all the relevant input and output formats, including AV1, and is worth a look by any company seeking a cloud encoding facility.
Hardware-Based Encoding and Decoding
Hardware transcoding was a topic of interest since I’ll be speaking on that at Streaming Media East. One company offering an extremely high-density solution is NETINT, which showed the Codensity T400 video transcoder based up the company’s own Codensity G4 SSD Controller SoC (system on a chip), capable of scalable H.264/H.265 transcoding at up to a single 4K/60p or 8x 1080p streams. The Codensity chip uses the NVMe connection now common in many enterprise-class storage servers to enable high-density solutions.
I also met with Anil Saw, product strategy advisor for Softiron, who showed their high-density HyperCast encoder, which was built around Socionext’s MB86M30 SoCs. HyperCast can support up to 32 SoCs in a 1RU form factor, and each SoC can encode up to one 4K60 stream or four 1080p60 streams, each with up to six video and two audio outputs. Both NETINT and Softiron are targeting telco, cable, Satellite, and cloud video service providers.
NGCodec develops, markets, and sells hardware encoders based upon Xilinx FPGAs. At NAB, NGCodec demonstrated live HEVC and AV1 transcoding that they claimed delivers 30% better compression than other hardware-based H.265 and VP9 encoders. Note that this is the technology used by Twitch TV for some of its hardware-based VP9 transcodes.
The AV1 live encoding was a technology preview and was i-Frame-only. Also new was double-density encoding for HEVC, which doubled the output capacity of each FPGA. NGCodec encoding is available as an AWS service, making it highly accessible. I spoke with with company president Oliver Gunasekara, and got some candid remarks about the newly announced Intel/Netflix’s SVT/AV1 codec.
Also on the hardware front, Beamr was showing hardware-accelerated CABR (content-adaptive bitrate encoding) based upon the Intel Graphics Technology GPU. At a high level, Beamr inserted its own rate control mechanism into the HEVC encoding pipeline of the Intel GPU, so Beamr does rate control while the Intel CPU does the HEVC encoding. This accelerates Beam’s content-adaptive encoding performance but the quality likely won’t match the quality of Beamr’s own software-based HEVC encoder.
What’s interesting is that Intel seems to be moving away from hardware encoding with its SVT-AV1 announcement and the associated HEVC, VP9, and H.264 codecs. Beamr is moving into this space and should be able to extend content-adaptive functionality to any GPU, CPU, FPGA, or other hardware with H.264 or HEVC encoding via the Beamr CABR SDK.
All of the aforementioned companies use hardware for encoding for distribution. On the contribution side, I spoke with LiveU vice president of engineering Daniel Pisarski on how HEVC works in the LiveU ecosystem and when 5G will become relevant for contribution. The short answer was that HEVC quickly become the favored format used by LiveU customers with that capability, though all transmissions had to route through LiveU’s cloud service where the signal was assembled from the various modems on the devices, transcoded to H.264, and streamed to the ultimate target. Pisarski expects 5G to become relevant in the next 12 to 24 months.
Quality and QoE Monitoring
SSIMWave is the developer of the SSIMPlus metric, a capable and lightweight metric used for VOD and live monitoring, and offers exceptional device support. At our NAB meeting, I learned about the company’s involvement with the HDR Evaluation Working Group of the ASC Motion Imaging Technology Council. The goal is to evaluate how different techniques and formats used in the HDR production workflow preserve the “original creative intent,” and to “calibrate the SSIM Plus Structural Similarity visual perception quality metric for HDR, wide-color, ultra-high resolution video material based upon feedback from professional cinematographers and colorists who produce and color grade such material.”
In essence, via input from these golden-eyes, SSIMplus will be “trained” for accuracy in HDR output, which should make it the metric of choice for HDR production. Several of the source/encoded comparison screens created for the HDR evaluation have made it into the SSIMWave retail product, including a fabulous screen for comparing color-related differences between the source and encoded output.
I spent a delightful 30 minutes with the folks from QoE vendor Nice People at Work, who always seem to live up to their name. I spoke to Jonathan Shields about the NPAW product line, including their anti-churn and CDN switching tool in a video. We also spent some time discussing which video publishers have already adopted a QoE solution, and which haven’t but should. Basically, Shields created a strong case that if video is mission-critical to an operation, then QoE should be, as well.
There were also a couple of one-offs, companies whose products didn’t neatly fit into any category. One of these is Mediamelon, whose SmartPlay Streaming is sort of a delivery-side per-title optimizer. That is, the service measures the quality of the streams in the ABR group and only distributes higher bandwidth streams when it actually increases the quality perceived by the viewer. So, simple scenes might be retrieved at 1080p@3 Mbps while higher action scenes would be received at 1080p@5 Mbps. The system can also download video in advance of a complex scene to avoid the buffering of larger-sized segments under constrained conditions.
We reviewed the product under the name QBR. Though QBR worked well, it had one severe limitation; it only worked when all the rungs in a ladder were encoded at the same resolution, which is contrary to most recommended approaches. That issue has happily been resolved, and the company has further refined the internal metric it uses to measure the quality of the streams in the encoding ladder.
That metric, called MediaMelon iMOS, runs at faster than real-time, which is great, but I wondered how closely the scores track with VMAF, which is the metric I trust the most for evaluating full encoding ladders. MediaMelon supplied the comparison shown in Figure 3, which shows a reasonable correlation between the two metrics. So, if you believe in VMAF, iMOS should be a good predictor of the quality that your viewers’ will actually perceive, which means that SmartPlay Streaming should work effectively for you.
Figure 3. Mediamelon’s delivery-side per-title should be able to cut download bandwidth by using this metric to predict quality.
SmartPlay streaming is a way to gain bandwidth efficiencies without re-encoding an entire library with a per-title or per-context technology. It can also work with per-title systems to improve bandwidth savings, though you’ll likely get the greatest bang for your buck with content encoded to a typical fixed bitrate ladder.
Finally, I interviewed Kyle Bank from Phenix Systems whose company uses WebRTC to reduce real-time latency to under 0.5 seconds for auction, betting, and similar applications.
In this interview from NAB 2019, LiveU clears up some of the hype around 5G (it won't make an impact until next year and there's no special health risk) and talks about the testing it's already doing with 5G modems.
Streaming Media's Jan Ozer and NETINT's Ray Adensamer discuss NETINT's Codensity T400, which is aimed at companies that need to do large live video encoding jobs at scale.
Conference attendance may have been slightly down this year, but South Hall—where nearly all the streaming video exhibitors were—felt busier than ever.
Epic Labs debuted LightFlow, one of the most exciting services to break at this year's NAB. LightFlow combines per-title encoding with network modeling, as well as per-device-type encoding. In this video interview, Epic Labs founder and CEO Alfonso Peletier explains the benefits it offers.
Companies and Suppliers Mentioned