How to Encode Video for HLS Delivery
HTTP Live Streaming (HLS) is a simple and elegant architecture created by Apple for delivering adaptive bit rate streams to iOS devices and compatible browsers, essentially Safari. Since its release, HLS has been incorporated into technologies that enable desktop computers to play HLS streams with Flash installed (JW Player) or within HTML5 browsers (THEOplayer from OpenTelly). HLS has also been (poorly) adopted by Google for Android and incorporated into most (if not all) OTT platforms like Roku. Though Dynamic Adaptive Streaming via HTTP (DASH) gets all the press, HLS gets all the eyeballs, and is as close to a “one-spec-fits-all” technology as is available in the adaptive streaming space.
If you’re submitting an app to the Apple App store that incorporates video playback over cellular networks, you must use HTTP Live Streaming if the video exceeds either 10 minutes duration or 5MB of data in a five-minute period, or roughly a stream with a data rate of 133Mbps. In these cases, you must also incorporate at least one audio stream at 64Kbps or lower bandwidth, either with or without a still image.
For all these reasons, understanding how to produce for HLS is a critical skill for most streaming producers. After describing how HLS works, I’ll cover the four phases of HLS production: configuring the variants, encoding the variants, creating the segmented data and metadata files, and validating the streams.
More About HLS
Though the name implies only live streaming, HLS can also distribute on-demand videos. Beyond simple playback, the architecture includes features like AES-128-bit encryption, CEA-608 closed captions, and timed metadata capabilities like opening a web page automatically when the stream is played.
Figure 1. How HLS works.
The HLS encoding and playback schema is shown in Figure 1. Like all HTTP-based adaptive streaming technologies, HLS encodes the original video into multiple variants at various resolutions and bitrates. It then divides each variant into multiple segments.
The location of each segment is defined in a index file with a .M3U8 extension, which you can see off to the right of each variant. A master .M3U8 file, on the extreme right of the figure, describes the data rate, resolution and other characteristics of each variant, and the location of the index file for that variant (Figure 2). All these are uploaded to a standard HTTP web server.
Figure 2. The master .m3u8 file with bandwidth, resolution, and profile-related info.
To trigger playback, you create a link to the master index file. During playback, the HLS-compatible device checks the master .M3U8 file and retrieves the first segment (segment 1) from the first variant listed in that file (the red arrow). Then it monitors bandwidth conditions. If bandwidth is plentiful, the device will check the master .M3U8 file, find the location of a higher-quality stream, check that stream’s .M3U8 file for the location of the next segment (segment 2), and retrieve and play that segment. If bandwidth status is not good, the device will perform the same basic procedure, but find and retrieve the next segment from a lower-quality stream. During playback, the device continuously monitors bandwidth conditions, changing streams as necessary to continue playing the highest-quality stream.
Job one when producing for HLS is to choose the number of variants and their configuration. So let’s start there.
Configuring the Variants
Anyone producing for HLS should start with a look at Apple Technical Note TN2224, a sampling of which is shown in Table 1. What’s important is not so much the precise configurations recommended, but the recognition that you’re producing for three different scenarios: low bitrate for cellular connections, moderate bitrate for cellular and Wi-Fi connections on older devices, and very high bitrates for exceptional quality on newer and high-end devices. This segmentation is particularly important when creating a single set of streams for mobile, computer and OTT playback, such as when you might be using the JW Player to deliver HLS streams to Flash enabled desktops.
Table 1. Apple’s recommendations for variants in TN2224.
When configuring your streams, you should consider each segment individually. For cellular, ask the question, “What’s the lowest-speed/quality configuration we want to distribute?” Besides the audio-only file, TN2224 recommends a 416x234 stream at 200Kbps video/64 Kbps audio, but many producers provide a lower quality stream, say at 100Kbps video/64Kbps audio for those watching on very slow cellular connections.
Then consider the middle tier. For full screen playback on iPhones, 640x360 is a reasonable configuration, but iPads (and desktops) will play the video in the playback window on your web site. Since it’s most efficient encoding- and playback-wise to encode/playback video at the same size as the display window, you should also have at least one variant for each video playback window on your website.
The 960 and higher-resolution screens are all for full-screen or OTT playback. Here, the question is “How much can we afford?” In other words, send the highest quality stream you can within the fiscal constraints of your monetization program.
Use a single adaptive group, packaged differently for different targets, to keep encoding and storage costs down.
While it's clear that Flash's time is coming to an end, it's less clear what will replace it. A survey shows DASH support, but its real-world use is around one percent.
Companies and Suppliers Mentioned