Deep Perceptual Preprocessing with a Difference
Per Hultin discusses meeting the challenge of a rising volume of content and the growing carbon footprint of online video streaming.
Consumers today have higher expectations for the quality of the content they consume than ever before. They also have more choice available to them than ever before. As a result, delivering more high-quality content as cost-effectively as possible is very much top of mind for content providers.
The media sector faces a number of challenges in tackling the surge in online media consumption, which is posing unprecedented stress on network infrastructures worldwide. The massive load on the internet infrastructure not only creates content delivery bottlenecks, but also affects how content can be distributed efficiently to larger numbers of viewers and contributes to its environmental footprint.
Cisco forecasts that more than half the global IP video traffic (56.8%) will be HD, and a quarter (22.3%) will be Ultra HD by 2022; this demand for high-resolution video means an inevitable trade-off between bandwidth and the end-user experience. Higher resolution video also typically requires excessively high bitrates, which can result in slow starts, video buffering, and high content delivery network (CDN) and storage costs.
In the continuing drive to balance efficiency and capacity, interest in the perceptual optimization of video—in other words, the processing of digital video streams to deliver the uncompromising quality that users expect at the minimum bandwidth—is rising. Traditionally, the world of digital video relies on compression, which is a processor-intensive process, to address these issues. To deliver ever higher quality content, while reducing the bandwidth requirements, the industry has worked to increase the efficiency and sophistication of the codecs it uses—but this brings much higher levels of complexity.
We are now at a stage where the increase in video encoding complexity is outpacing Moore’s Law. Even with more GPU and CPU capacity to encode video content, the sheer volume of content being produced and watched means we will very quickly outstrip the compute cycles available. We are also facing a situation where the carbon footprint of the internet is estimated to be greater than that of the aviation industry.
As a company, we believe that disruptive innovation for video streaming is urgently needed. We need new pre- and post-processing, encoding, and delivery tools that are device-aware and cross-codec compatible. This is the only way we will meet the growing demand for online video, reducing processing, energy, and storage requirements. SEQUOIA, a $960k R&D project partnership between iSIZE, the BBC R&D, and Queen Mary University of London (QMUL), is one of the ways that we as a company are working towards these aims.
SEQUOIA is focused on bringing innovative technology, including artificial intelligence, to improve the way video content is distributed and responds to the pressing need for video streaming to become more sustainable. The SEQUOIA project is looking at perceptual optimization of video streams as a way of making significant reductions in bandwidth required for equal quality. This is at the heart of iSIZE’s work, we have built up extensive expertise in this domain.
A New Approach for a New Video Era
iSIZE takes a unique approach to solving what is an increasingly urgent challenge of finding trade-offs between the various metrics, between bitrate and perception—all while managing processing and encoding complexity. Instead of relying on more complex codecs and greater GPU/CPU capacity, we have developed a preprocessing solution that reduces the encoding bitrate while operating before the encoder, and without needing any information on the encoder specification.
Our innovative BitSave solution leverages patent-pending artificial intelligence (AI) features and machine learning, combined with the latest advances in perceptual quality metrics. It enhances details of the areas of each frame that affect the perceptual quality score of the content after typical block-based prediction and quantization and attenuates details that are not important. By reducing the bits required for elements of the image that perceptual metrics tell us are not important to human viewers, BitSave ensures that perceptual quality is optimally balanced against encoding bitrate.
BitSave is a server-side preprocessing enhancement that is cross-codec applicable and optimizes legacy encoders like AVC/H.264, as well as HEVC/H.265, AV1, and VVC/H.266, without needing to know the encoding specifics of each encoder. Crucially, BitSave does not change the encoding, packaging, transport or decoding mechanisms—unlike solutions such as LCEVC—making it fully-compatible to any encoding, streaming, and playout device with no modifications.
Most preprocessing solutions use some variant of a sharpening or contrast adjustment technique to deliver perceptual optimization, e.g., the tune-vmaf options in HEVC or AV1 encoders. What sets BitSave apart is that it maintains the perceptual characteristics of the source without sharpening or changing contrast/brightness/color properties, and it eliminates the need for in-the-loop integration used by many other encoding or perceptual optimization tools. BitSave is a single-pass preprocessing solution that needs no metadata or integration with the subsequent encoding engine(s) and delivers significant gains in quality.
By placing our technology before the encoder, we ensure it does not depend on a specific codec, and it optimizes both for low-level metrics like SSIM (structural similarity index metric), as well as for higher-level (and more perceptually oriented) metrics like Netflix’s VMAF and Apple’s AVQT metric or AI-based perceptual quality metrics like LPIPS. For such metrics, BitSave was shown to offer average bitrate savings vs. the same encoder and recipe that often exceed 20%. We have also designed our solution in a way that does not break coding standards, allowing it to be used in existing distribution chains and with existing client devices.
For further flexibility, the iSIZE Software Development Kit (SDK) allows BitSave to be trialled as a Linux binary, a Docker container service, or as a Linux SDK for CPU or GPU integration with an on-premise encoder. The SDK’s CPU runtime is comparable to low-complex encoding (e.g., AVC x264 medium preset), a result obtained in part via our recent partnership with Intel in order to optimize our framework for inference on Intel CPUs. Moreover, its runtime on mainstream NVIDIA hardware like T4 GPUs can be as fast as 3ms/frame for 1080p resolution.
So, what are the benefits of using BitSave? In a nutshell, it delivers significant savings in two key areas. First, it reduces the bitrate required from a standard codec to deliver a certain quality level. In addition, if bitrate saving is not the only goal, BitSave’s modest runtime means it can also be used to make the actual encoding much faster—up to 500%—or even faster in cases like VP9, AV1, or VVC encoding.
Overall, BitSave improves on multiple state-of-the-art quality metrics, and across multiple video encoding standards. We believe we can go further since our approach offers compounded gains to any encoder-specific perceptual quality optimization: a real, measurable, significant saving in bitrate without impacting visual quality.
This innovative technology elegantly answers one of the growing challenges faced by the industry: sustainable distribution of Ultra High Definition content, while limiting the impact of video on internet traffic and reducing distribution costs. We believe that our solution will make an impact at every stage in the media distribution chain, delivering benefits for the whole sector by proactively reducing energy consumption at all stages within the media value chain.
iSIZE is currently working with customers to roll the technology out in the gaming, social media, and entertainment video streaming sectors. In the next few months, we are looking forward to making some important announcements on the commercial benefits offered by our framework.
This article is Sponsored Content
If you're serious about experimenting with different codecs and/or encoding parameters, MSU's Video Quality Measurement Tool is an essential tool, and version 13 brings some welcome improvements.
Akamai Chief Architect Will Law discusses all the elements of a video stream that contribute to user-perceived video quality, and why bitrate alone is a poor indicator of stream quality in this clip from Streaming Media Connect 2021.
Jan Ozer discusses the pros and cons of three key objective quality metric tools: Moscow State University, SSIMplus, and Hybrik (Dolby).
Average scores can be deceiving, so be sure you're using a tool that gives you a more accurate assessment of your video quality
Companies and Suppliers Mentioned