Buyers' Guide to Encoder Appliances
Here’s a list of things that software-only encoding solutions have in their favor: quality and ubiquity. Chose any two. But when it comes to the speed of encoding and delivery, especially around low-latency encoding, sometimes software-only encoding solutions just don’t cut it. The same might also be said for high frame rate (HFR), high dynamic range (HDR), or even 4K ultra-high-definition (UHD) encoding.
This Buyers’ Guide on encoder appliances will explore the tradeoffs between hardware and software encoding solutions, hopefully empowering readers to more easily determine which solution suits their specific needs.
The Hardware vs. Software Dilemma
In the early days of streaming, every format and codec was defined by its capability to deliver on-demand content in one of three ways: file downloads, progressive downloads, or streaming.
File downloads were fairly straightforward: a video file needed to be fully downloaded before playback could begin. On the other hand, progressive download files could start playing the content after a certain percentage of the file was downloaded, on the assumption that the file would download faster than it was played back. This meant the initial wait for the beginning of the file to download gave the patient end user a “head start” in playback that would, ideally, be devoid of buffering.
The third way, true streaming of on-demand content, required hefty hardware assistance in further compressing video content into data rates more akin to dial-up modem speeds than to CD-ROM data rates. To be honest, though, hardware assistance was needed for almost every early video codec, regardless of the data rate, resolution, or even frame rate.
In fact, the only codec in the first year of streaming media (the 1997–1998 timeframe) that had a software-only encoding option was MPEG-1 and its audio sidekick, the MP3 format. MP3, which has just reached the out-of-patent-protection stage after 20 years, was called that because it was the third audio coding format in the MPEG-1 video and audio standard ratified by the Moving Picture Experts Group (MPEG).
As more powerful general processors—whether general-purpose processors (GPPs) or central processing units (CPUs)—emerged, bringing with them an opportunity to move the next generation of video (MPEG-2) from hardware-only to software-only encoding solutions, there was often a sacrifice to be made in using software-only encoding: the final output often reduced the frame rates from 24–30 frames per second down to 10–15 fps to lower the overall encoding session length.
Part of the reason these software-only options didn’t cut it was due to the additional need to interweave the MP3 audio and MPEG-2 video elementary streams together in to a multiplexed stream suitable for transmission. This multiplexed, or muxed, file was often referred to by the acronym M2TS, for MPEG-2 transport stream.
This MPEG-2 transport stream technology, now more than 20 years old, still forms the basis for the majority of Apple HTTP Live Streaming (HLS), using a newer codec that replaced the MPEG-2 video codec: H.264, also known as MPEG-4 Part 10 or as Advanced Video Coding (AVC).
Why Hardware Is Still Necessary
HLS allows “streaming” delivery of a series of small files at a predefined data rate that are downloaded, assembled into a playlist for back-to-back playback, and then played in sequence until no more files (known also as “fragments,” “segments,” or “chunks” of data) are available to play.
The lack of additional files for playback either signifies the end of the video being played back, which can be confirmed by comparing the segment number with a predefined numbering sequence in a manifest file, or it signals a need to send subsequent segments at a lower data rate, since the player is not receiving current data rate segments in a timely enough manner for proper playback.
In other words, this version of streaming, where the segments are transmitted in a near-real-time fashion, is more akin to the early file download approach, although in practice it acts more like a progressive download, since HLS won’t start playback until at least three segments have been downloaded as a “head start” for continuous playback.
As a result, on-demand content needs to be transformed in faster-than-real-time encoding sessions, where an hour of prerecorded content may need to be converted in less than 30 minutes, or roughly twice the speed of real time. This allows for various additional steps to be performed, such as setting ad-insertion points and flags, as well as encoding multiple channels of alternate audio (i.e., overdubs in a foreign language).
Real-time delivery of live content, though, has to move beyond the multiple-segment scenarios of HLS and its MPEG-ratified equivalent, MPEG-DASH, or Dynamic Adaptive Streaming over HTTP.
Even 20 years into the streaming revolution, here in 2018, streaming live video in a low-latency scenario often requires hardware assistance, both for conformance to old-school, real-time streaming protocols (RTP and the more secure RTSP) and for a way to minimize delay or latency.
Inherently, the average frame is 1/25 or 1/30 of a second, so about 33.3 milliseconds (ms) on average. If several frames are needed to analyze changes between frames (aka temporal changes; see our Buyers’ Guide on content- and context-aware encoding), then an additional inherent latency is introduced, since the encoder will need approximately 100–300 ms worth of video content to accurately assess temporal changes.
When moving to the cloud, don't let price be the only consideration. This guide explains the different categories for cloud VOD encoding and the features to look for in each.
While nearly any encoder can connect to any streaming service, some encoders make it easier than others. Here's how to choose the right tool for the job.
If you're not already using per-title encoding, it's time. Here's a guide to choosing the tool that's best for you.