How to Encode Video for HLS Delivery
How many streams do you need? That depends upon a number of interrelated factors, including the following:
- The original resolution of the video--you need more for HD than for SD.
- Whether the customer is paying for the video--usually you need more for subscription services than for free Internet video.
- The configuration of your lowest and highest quality streams--you need sufficient streams to provide a good quality stream at all relevant connection speeds.
There is no magic number, but Apple recommends that bitrates be a factor of 1.5 to 2x apart; otherwise, the streams are very similar in quality and you’re wasting encoding resources and storage space.
Otherwise, when choosing your variants, you must use the same aspect ratio, you can’t switch from 4:3 to 16:9 or vice versa. Note that if you’re encoding 4:3 source videos, there’s a separate table in TN2224 for those files.
In addition, as Table 1 suggests, don’t worry about mod-16, or a file resolution with the width and height both divisible by 16. Many compressionists recommend mod-16 because H.264 uses 16x16 blocks to encode the video file, and mod-16 files are the most efficient to encode. Typically, however, the playback windows chosen by website developers dictates the resolution, which is why 640x360 is the most widely-used resolution on the planet. No worries, at this resolution, the inefficiency of using a non-mod16 resolution is very small, well into the lower single digits.
Finally, some data rates used by Apple are unnecessarily high, particularly at the upper end. For example, YouTube and ESPN deliver 720p video at 2.5Mbps, almost half the lowest data rate recommended by Apple. Similarly, 1.8Mbps for 640x360 is quite generous. So you might adjust the data rates downward to save bandwidth dollars, but the overall schema is quite sound.
Once you’ve chosen the number and configuration of your variants, it’s time to encode the files.
Encoding the Variants
At some point, I should mention that HLS is only compatible with H.264; now is a good time as any. Note that Apple changes the profile used for each variant to maintain compatibility with older devices. This is essential, or the video files won’t play, so this is one area where I wouldn’t diverge from Apple’s recommendations.
There are several other critical areas of focus; let’s take those one at a time.
TN2224 directs that each segment have at least one IDR keyframe per segment, most preferably at the beginning of the segment. Complying with this will involve multiple configurations that will vary by encoding tool.
First, if the encoding tool gives you the option, set all keyframes to be IDR frames. If this option isn’t provided, don’t worry, invariably, the encoding tool is making each keyframe an IDR frame. Next, make sure the keyframe interval used to encode the file is consistent for all variants, and divides evenly into the segment size. At the recommended segment size of ten seconds, you should use a keyframe interval of one, five or ten.
Most encoders have an option to insert keyframes at scene changes, which can improve stream quality. When available, don’t enable this option unless you’re certain that this won’t reset the keyframe interval, which could result in the first frame of a segment not being a keyframe. For example, some encoders, like Sorenson Squeeze, offer a control to enable “Fixed I-Frames Distance,” which ensures that there’s a keyframe at the specified interval, even if another intervening keyframe was inserted at a scene change. When this is available, you should always enable it.
The HLS schema works best when the data rate of each variant is consistent. For this reason, you should encode your streams using either constant bitrate (CBR) encoding, or constrained variable bit rate (VBR) encoding, with a maximum data rate of 125-150% of the target data rate.
As we’ll discuss below, data rate consistency is one of the file characteristics checked by Apple’s MediaValidator tool. If the actual data rate of the file exceeds the listed data rate by more than 10%, you’ll see an error message like that shown in Figure 3. In the figure, segment 16 was off-target by 54%. Interestingly, I produced that error by encoding the file in Sorenson Squeeze using VBR constrained to 300% of the target. Since the final segment contained the most motion, that’s where Squeeze packed the most data, resulting in the error. Note that when I encoded using CBR, which is Squeeze’s default for HLS video, the files passed MediaValidator’s scrutiny without any problems.
Figure 3. This file failed in Media Validator because the segment bandwidth exceeded the target bandwidth.
As Table 1 reflects, Apple recommends that you encode all variants using the same audio parameters. Though not stated in TN2224, this is because switching audio parameters during playback can cause popping or other audible artifacts. Because the recommended audio data rate is rather low, some authorities recommend using High Efficiency AAC (HE-AAC), rather than the Low-Complexity profile (AAC-LC), because HE-AAC delivers superior quality at lower bitrates.
If you decide to use different parameters to reward your high end viewers with a superior audio experience, use the same sample rate and change the data rate or number of channels (mono or stereo) in the higher-end streams.
Segmenting Your File
After encoding your files, you need to create the segments and index files, for which there are many options. For example, once you become an iOS developer ($99/year), you can download Apple’s HTTP Live Streaming Tools, which include the aforementioned Media Stream Validator, and the Media Stream Segmenter and Media File Segmenter. Both the segmenters are command line tools that create the segments and index files. The Media Stream Segmenter works with live and disk-based MPEG-2 transport stream files, while the Media File Segmenter works with disk-based MP4 files. For more information on using these tools, check out Apple’s HTTP Live Streaming Overview.
In terms of segment duration, the most confusing aspect of TN2224 is the recommendation of a segment size of ten seconds, and a keyframe interval of three seconds, as this wouldn’t seem to produce a keyframe at the start of each segment. Interestingly, the new default settings in Apple Compressor 4.1 follow these recommendations, creating a segment duration of ten seconds, but using a keyframe interval of three seconds.
Use a single adaptive group, packaged differently for different targets, to keep encoding and storage costs down.
While it's clear that Flash's time is coming to an end, it's less clear what will replace it. A survey shows DASH support, but its real-world use is around one percent.
Companies and Suppliers Mentioned