Register now and get Early Bird pricing for Streaming Media 2025, October 6-8, in Santa Monica, CA! 

StreamingMedia.com Industry Announcements

View Press Releases --- Add Your Press Release

StreamingMedia.com provides this section as a service to its readers and customers.

Please read our new press release policy, effective February 1, 2022.

Press releases are subject to approval by the editorial staff of StreamingMedia.com and may be edited or altered for length and clarity, or to remove unsubstantiated and unverifiable claims.

All content presented within the press release section is that of the submitter. StreamingMedia.com does not necessarily endorse such content and bears no responsibility or liability for its accuracy.

Accelerated Video Encoding: G&L Systemhaus illustrates how specialised hardware can triple energy efficiency

Streaming with specialised VPUs can cut encoding energy use by up to three-fold versus CPUs. Tests confirm real-time speed and broadcast quality remain intact, charting a clearer path to greener large-scale delivery.

Cologne, Germany(09 Jul 2025)

By Ben Schwarz, Greening of Streaming President

In a world increasingly concerned with sustainability, the energy demands of digital infrastructure are coming under scrutiny—especially in streaming, where encoding and decoding dominate processing costs. At a recent meeting of Greening of Streaming, an industry initiative dedicated to lowering the carbon footprint of online video, G&L Systemhaus, a German systems integrator and managed service provider, shared new research: Head of Streaming Engineering Martin Schmalohr compared the energy profiles of CPU-based versus hardware-accelerated (VPU) encoding workflows, drawing on his latest R&D. The findings point decisively toward a greener future when specialised hardware is used optimally.

Test Environment: A Balanced, Comparable Setup

To ensure meaningful and repeatable results, Martin obtained the measurements from G&L’s Audio Video Processing Unit (AVPU), which supports both CPU and hardware acceleration. This unit is housed in a Supermicro 1U server featuring an Ampere Altra Max ARM CPU and up to 10 NETINT Quadra T1U VPUs. These VPUs, NVMe-form-factor ASICs, handle video encoding, decoding, and scaling in hardware. Software used includes FFmpeg (custom-compiled for Quadra support), libx264/x265, libsvtav1, and Ubuntu containers orchestrated by Docker. The setup is representative of a VoD transcoding environment.

Content and Codecs: Simulating Real Workloads

Fifteen diverse 20-second test clips were used, each representing a unique scene type, ranging from high-motion events, such as marathons, to low-motion animations. These were encoded using three widely used codecs:

  • H.264 (AVC): Older but still dominant
  • H.265 (HEVC): Offers better compression at the cost of more computation
  • AV1: The latest in efficiency, but also the most computationally demanding

Each codec was tested using both CPU-based libraries (libx264/x265, libsvtav1) and hardware acceleration via Quadra VPU implementations. Rate control modes included constant bitrate (CBR), capped variable bitrate (VBR), and constant rate factor (CRF).

Encoding Speed: Real-Time Achievable Only With Hardware

CPU encoding struggled to keep up: H.265 and AV1 on CPUs often failed to reach real-time encoding speeds (especially for 1080p50 content), even with multiple cores. AV1, in particular, consumed high resources without hitting real-time thresholds, likely due to constraints in hyper-threading implementation and respective CPU allocation.

In contrast, the VPU (hardware-accelerated) setup maintained real-time speeds across almost all scenarios, even at higher quality settings. A single Quadra card typically delivers up to 70 fps at under 75% load, while encoding 15 HD clips at 1080p in parallel, effectively doubling or tripling throughput compared to CPU-only workflows, consuming up to 35 ARM cores.

Energy Consumption: Encoding at the Cost of Coffee

Energy use was logged every two seconds via OpenBMC, focusing exclusively on incremental energy (excluding idle server load, ~240W). CPU-encoding 20-second clips consumed 15–20 Wh, roughly equivalent to the energy required to brew a standard cup of coffee. With ASIC-based encoding, this dropped dramatically to 2–5 Wh per clip, using a single Quadra card at 66% load (with 15 parallel HD encodes).

When encoding was spread over 10 Quadra cards, power consumption increased to 5–10 Wh. Still, load per card was only ~6%, highlighting a key insight: hardware energy efficiency scales best when utilisation is high.

Quality and Control: No Major Tradeoffs

Video quality, assessed using VMAF (Netflix’s perceptual quality metric), showed no significant overall difference between CPU and VPU encodes at the same bitrate and codec profile, confirming that energy savings don’t come at the cost of fidelity, with the caveat, that codec parameters are not identical among different implementations especially from CPU to ASIC based compression, which requires some codec specific finetuning. Interestingly, rate control was less stable for libsvtav1, which occasionally deviated from the target bitrate, emphasising the maturity gap between some CPU libraries and hardware-accelerated encoding implementations.

Optimisation Takeaways

  • In the tested setup, NETINT ASIC acceleration is 2–3 times more energy-efficient than ARM CPU encoding. This is just a first data point, and such a metric requires validation with additional data points from various hardware setups.
  • H.264 shows less sensitivity to acceleration; this could be due to decades of optimisation.
  • AV1 benefits most from hardware, but may require tuning to maintain consistent rate control.
  • Parallelisation is essential: encoding multiple streams in tandem on a VPU amplifies energy savings.
  • Underutilised hardware loses its efficiency edge; 10 lightly used cards consume more energy than a single well-loaded one.

Next Frontier: Decoding, Systemic Integration, and Optimal VPU Load

Future work with Greening of Streaming will explore if encode optimisation could impact decoding power requirements for different device-specific decodes (Smart TV, STB, mobile devices), even marginally, which is especially important as end-user devices scale. This investigation will be worthwhile even if we reach the expected answer of no.

Questions also remain about embedded carbon, component lifespan, and whether programmable GPUs can compete with ASICs, such as the Quadra, in terms of efficiency.

Another simple test to complete these findings will be to define the relationship between the VPU’s added energy efficiency and its usage: is there a sweet spot where it delivers most of the energy savings below full load, or do we need to get as close to 100% VPU load as possible? Is the relationship linear or does it follow a typical curve?

Conclusion

Martin & G&L’s work underlines a vital message for streaming infrastructure: hardware acceleration isn’t just about speed, it’s about sustainability. To unlock its full potential, the system’s architecture must be fine-tuned for the specific video processing task and run hot and heavy. In encoding, as in many things, efficiency loves company.

Start the Conversation

Interested in VPU-powered encoding, energy metrics, or related topics? Please don’t hesitate to reach out to G&L Systemhaus.