Encoding for Multiple Devices
Another factor complicating Android support is that HLS support came late, starting with Android version 3.1. You can check the penetration of Android versions; when I checked in late December 2012, version 3.1 and newer versions only accounted for about 35% of the total Android market, making HLS an incomplete solution. It's also an imperfect solution, with crashing, seeking, and aspect ratio issues on some platforms, as you can read about in "Jeroen Wijering Talks HLS, DASH, and the JW Player 6." As Wijering points out, the most comprehensive solution is likely to build your own app.
MICROSOFT WINDOWS PHONE
Though Microsoft's existing share is currently negligible in the mobile/tablet space, it has high hopes for Windows 8 and RT and its Windows Phone platform. Like Apple, Microsoft offers a limited number of phones and documents their capabilities nicely. Note that while Windows RT will support Flash (and AIR by mid-2013), Windows phones do not currently support Flash. Support for Windows Phone is not on Adobe's Flash technology roadmap.
As you can see in Table 1, the only adaptive technology supported by the Windows Phone platform is Smooth Streaming. As noted in the Supported Media Codecs for Windows Phone document referenced previously, not all Windows phones support dynamic resolution changes. For these phones, all resolutions in the adaptive group must share the same resolution.
The best source for recommended encoding parameters for Smooth Streaming are the encoding presets contained in Microsoft Expression Encoder 4. Though space considerations prevented us from reproducing that spreadsheet here, for anyone interested, I recorded the configuration parameters recommend for 1080p source video in a Google Documents spreadsheet that you can access. As you'll see, the preset uses multiple resolutions that wouldn't work for some versions of Windows phone.
Again, OTT devices are easier than mobile because they all live on at least relatively high-speed connections and can all decode virtually any H.264 stream you throw their way. You have links to the playback and adaptive streaming specs, so here I'll just point out any highlights therefrom.
Though Roku supports multiple adaptive specs, its guide makes it clear that HLS is the preferred technique. The guide also identifies the Wowza Media Server as a "very popular, budget minded choice in the HLS field," with a useful guide to getting up and running with the Roku Streaming Player.
Apple TV is discussed previously in the iOS section. Note that according to the Boxee support boards, HLS only works within an application, not in the browser. Interestingly, as you can see in Table 4, GoogleTV adapted its stream recommendations from Apple TN2224, though it ignored the lowest quality grouping and recommended the High profile for all streams.
Finally, for Smooth Streaming to the Xbox, see the earlier discussion about Microsoft Windows Phone. Note that I checked Expression Encoder, and there were no Xbox presets.
With this as background, let's start making some decisions, beginning with the number of streams.
HOW MANY STREAMS?
As shown in Table 2, Apple recommends 10 streams for 1080p-source content, including the audio-only stream. However, before adapting that recommendation, let's examine how much it would cost to distribute Apple's highest-quality stream. Specifically, at 8,564Kbps for 1080p video, an hour of video would consume around 4GB. According to Dan Rayburn's latest blog on the subject, CDN pricing for customers buying from $100,000 to more than $1 million/year in bandwidth ranged from a low of 1 cent per GB to a high of 12 cents.
At these prices, it would cost between 4 cents and 48 cents to stream an hour of video at Apple's highest recommended rate. However, I've seen legacy bandwidth pricing for more modest-sized commitments as high as $1.10/GB, which would boost the per-hour transfer cost of this 1080p configuration to $4.40.
When configuring your highest-quality stream, choose the highest data rate that you can afford, given your monetization strategy and cost structure. Since your top-quality stream has to look very good, you'll have to adjust video resolution accordingly. For example, if you can only afford 3Mbps at the top end, encode at 720p, not 1080p.
At the other end of the spectrum, identify the lowest video data rate that you'd like to support. For Apple, that's 200Kbps, though I've had clients who produced video as low as 110Kbps. Then identify the resolution/frame rate combination that delivers optimal quality at that video data rate. Apple's 416x234 at 10-12 frames per second is a reasonable starting point.
Now you've got your high- and low-end streams. Next you need to choose the number of streams that accomplishes two goals. The first is to provide at least one stream for every window size the video will be played in within a browser. For example, YouTube plays 16:9 videos at two window sizes, 640x360 and 854x480, plus full screen. If you upload a 720p or larger video, YouTube will create videos at both of these resolutions, because both encoding and video playback is most efficient when the video is displayed at its native resolution. So if you display video on your website in a 640x360 window, you want at least one stream at that resolution.
You also want a sufficient number of streams to serve as reasonable stepping stones between your highest- and lowest-quality streams. For YouTube, this meant four 16:9 streams between their mobile stream configured at 176x141 and their 1080p stream (or, streams at 426x240, 640x360, 854x480 and 720p). Though I don't know the specs of ESPN's mobile or OTT streams, for computer viewing, there were three streams between the low of 480x272 and high of 720p: 576x324, 640x360, and 768x432.
More streams are not necessarily better; more streams means that the streams are closer together, minimizing the quality difference while increasing the frequency of stream switching, which can disrupt viewing. The ideal scenario is when the viewer quickly identifies the optimal stream and continues to watch that through the end of the video.
Other Configuration Options
Once you know how many streams you'll produce, you need to configure them. At the low end of the spectrum, I prefer to drop the frame rate rather than resolution; you can read all about why in "Configuring Low Data Rate Adaptive Streams." In terms of choosing the data rate for each stream, the differences should start out fairly small -- such as the 200Kbps between Apple's first three streams with video -- and continue to increase at higher bitrates, such as the 2Mbps separating the top four streams.
Probably the biggest configuration issue relates to the H.264 profile applied to each file. For example, if you follow Apple's recommendations, you'll use the Baseline profile for the first four streams with video and the Main profile for the next four, in all cases to maintain device compatibility. If you're producing for the Android platform, the safest approach would be to use the Baseline profile for all streams. However, all OTT platforms and computers can play streams encoded using the High profile Should you produce separate groups of files for each, increasing encoding and storage costs?
I evaluated this issue in my article, "H.264 in a Mobile World: Adios to the Main and High Profiles?," which essentially documented the research I performed on behalf of a consulting client. Specifically, I produced three test cases comparing the quality of video encoded at the Baseline, Main, and High profiles using otherwise identical parameters. In only one of those test cases, where the video was encoded at 640x360@240Kbps, was the difference visible. At more reasonable settings, such as 720p@800Kbps and 640x480@468Kbps, the files were virtually indistinguishable.
The client looked at the difference in quality between all the files and reasonably concluded that the 640x360@240Kbps file would seldom be viewed for long by a computer user connecting via broadband. He decided to produce only one group of files, using the Baseline profile where necessary to maintain compatibility with targeted mobile devices. I suggest you perform the same analysis with representative footage and draw your own conclusions.
Pay particular attention to the quality difference between the Main and High profile streams. If you choose the High profile for OTT and computers, rather than Main, you'll need to create equivalent files using the Main profile for iOS/ Android compatibility.
I would assume that most Android devices share similar hardware playback capabilities as Apple devices of the same form factor and approximate release date, so I wouldn't create a set of Baseline-only streams for Android. Rather, the schema shown in Table 2 is probably safe for Android. It's probably the optimal schema for efficient, one adaptive group encoding for all computer, mobile, and OTT targets.
Wrapping Things Up
Now that you've made all the hard decisions, it's time to touch on the mechanical aspects of encoding for adaptive streaming. First, the key frame interval for all files needs to be identical for stream switching to occur seamlessly. Most producers use an interval of either 2 or 3 seconds and disable the insertion of keyframes at scene changes.
Second, encode using either constant bitrate encoding (CBR) or constrained variable bit rate encoding (VBR), with a maximum data rate of between 1.25 and 1.5 times the target. These techniques will minimize stream switching that occurs because of changes in the video data rate rather than changing bandwidth or CPU conditions.
Finally, regarding audio, recognize that it's safest to use the same audio parameters for all files in an adaptive group, which minimizes the risk of popping or similar artifacts during stream switches. This is why Apple recommends 44.1 kHz audio at 64Kbps for all streams in Table 2.
On the other hand, if you're producing premium content where audio quality is a significant component of the overall experience, you may find this approach too restrictive. To minimize potential issues, use the same frequency for all streams and switch the number of channels, data rate, or both. For example, consider using 44.1 kHz mono audio at 32Kbps for your lowest stream, 44.1 kHz mono at 64Kbps for mid-quality streams, and 44.1 kHz stereo at 128Kbps for your highest quality streams. Then test before going live to ensure that audio artifacts don't occur when switching streams.
This article appears in the 2013 Streaming Media Industry Sourcebook.
If you're delivering a live video stream, you likely want to publish it on as many platforms as possible. Here's the best hardware and software to pull it off.
The newly released JW Player 6 brings HLS support to the desktop. We talk with Wijering about how he did it and what the future holds for MPEG DASH.
Companies and Suppliers Mentioned