4 Keys to Evaluating Cloud Transcoding Options
Many high-quality live productions involve cloud transcoding, since sending a single stream into the cloud for transcoding is often the most bandwidth- and cost-efficient approach. I've been benchmarking hardware and software transcoders for the last 12 months or so; this column shares the key differentiating points among transcoders as I see them.
For high-volume operations, the first comparison factor to nail down is the cost per encoding ladder per hour. To compute this, you create a representative encoding ladder for all transcoders, making sure to align parameters like bitrate control method (always constant bitrate), keyframe interval, closed GOPs, presets, and profiles as closely as possible. This is often complicated by conflicting recommendations from different vendors (like one recommending two B-frames and another recommending three B-frames) that you have to reconcile before starting testing.
Once the commands are set, I typically encode on an AWS or other cloud instance running Ubuntu with a version of FFmpeg modified for the transcoders that I'm testing or the stock version of FFmpeg for x264 and x265. To compute the cost per encoding ladder per hour, you start to encode a single ladder with the encoder, and if you achieve the full frame rate, you open another terminal window and start another stream. Recently, running x264 on a c5d.9xlarge that cost about $1.75 hour, I produced six full encoding ladders at 55 fps or higher using 60 fps source files. This yields a cost per ladder per hour of about $0.30. As long as you're testing on a platform with a known hourly price, this one is simple.
Not so simple is quality, which has three aspects: objective, subjective, and transient. It's easy to create the four encodes necessary to produce rate-distortion curves and BD-Rate comparisons with Video Multimethod Assessment Fusion (VMAF), SSIMWAVE, peak signal-to-noise ratio (PSNR), or other metrics; just remember to follow the tuning recommendations for each codec. For example, x264 has tuning settings for PSNR (which I also use for VMAF) and SSIM. The problem with objective metrics when comparing different transcoding engines is that they often don't accurately reflect how human eyes will actually rate the videos. So I almost always include subjective comparisons via a service like Subjectify.us or GBTech.
Assessing transient quality is easy: I load different files into Moscow State University's Video Quality Measurement Tool (VQMT) and run each metric. VQMT produces a results plot that shows the metric score over the duration of the file. Noticeable quality drops are obvious, and you can click a button to actually view the frames at that location and confirm the degraded quality. If the quality drop lasts more than a couple of frames, it likely will be noticeable to an actual viewer.
Transient quality is typically where software transcoders tend to degrade, particularly when running lower-quality presets like veryfast or ultrafast (for x264 and x265). So you can't just look at overall score; you have to look for transient quality issues.
You don't see huge latency deltas among encoders, but since a second or two of latency can increase overall quality and decrease the likelihood of transient issues, you want to track this as well. Although I'm on an Ubuntu instance in the cloud, I'm actually running the tests from a Windows-based computer in my office. To measure latency, I capture the tests with a screen capture program like Camtasia, which provides both an accurate measurement and a record I can share with the client.
Data-rate consistency is how tightly the data rate produced by the transcoder tracks the average data rate, measured by the standard deviation or by eyeballing data-rate plots of the streams. With video on demand, data-rate consistency isn't a concern because all viewers are watching at different times. With live streams, if you spike the data rate by 50% during the GOOOAAALLL celebration for hundreds of thousands of viewers, you could exceed the resources of your delivery ecosystem and crash the event. If you're distributing even a few thousand streams, this isn't important; if you're in the hundreds of thousands, it's a comparison you need to consider.
A couple of closing notes: First, while H.264-based transcoding works well in the cloud, HEVC requires many more resources and may not; hardware is often a better option. Second, while this testing sounds straightforward, even after benchmarking codecs for more than 25 years, I've found it remarkably hard to get it right the first time. So build time for retesting and re-retesting into your planning.
We put hardware-based solutions from NVIDIA, Intel, and NGCodec to the test to see which offers the strongest performance and the highest quality.
Whether it's enterprise, ecommerce, news, or gaming, it feels like everyone is rushing to go live. Here's a guide to services that will get any company streaming now.
Companies and Suppliers Mentioned