March 16, 2020
By Jan Ozer Contributing Editor
Featured Articles

Buyers' Guide to Low-Latency Solutions

Low latency was one of the key concepts of 2019, and it's no less important in 2020. In this buyers' guide, I'll discuss how to determine what kind of latency you need as well as identify the available technologies and detail factors to consider when choosing among them.

As with all buyers' guides, the list of products and companies here is meant to be representative, not exhaustive, so if you're a potential buyer, use it as a starting point and perform your own research. If you're a supplier who wasn't mentioned, feel free to add your product in a comment at the end of this article.

Before getting started, it's useful to remember that the lower the latency of your live stream, the less resilient the stream is to bandwidth interruptions. For example, using default settings, an HTTP Live Streaming (HLS) stream will play through 15-plus seconds of interrupted bandwidth, and if it's restored at that point, the viewer may never know there was a problem. In contrast, a low-latency stream will stop playing almost immediately after an interruption. In this regard, the benefit of low-latency startup time always needs to be balanced against the negative of playback stoppages. If you don't absolutely need low latency, it may not be worth sacrificing resiliency to get it.

Do I Need Low Latency?

From a technology-selection perspective, there are three levels of latency. The first is "doesn't matter," which is a one-to-many presentation with little interaction—think a church service or a city council meeting or even a remote concert. For applications like these, drop your segment size to 2–4 seconds, and you can reduce latency to between about 6 and 12 seconds with very little risk, no development cost, and minimal testing cost.

The second level is "spoiler time," or the proverbial enthusiast who is watching TV next door and starts shouting (and tweeting) about a goal 30 seconds before you see it. Most broadcast channels average about 5–6 seconds of latency; if you're a streaming service competing with broadcast streams, you'll need a true low-latency technology to get your latency in the same range.

The third level of latency is "real-time," as required by interactive applications like those for gambling, auctions, and gaming, in which even 2 seconds is too long. If your application is one of these or similar, reducing segment sizes won't cut it, and you definitely need a low-latency technology.

What Technologies Are Available?

Most low-latency solutions use one of three technologies: WebRTC (Real-Time Communications), HTTP Adaptive Streaming, or WebSockets. As the name suggests, WebRTC is a protocol for delivering live streams to each viewer, either peer to peer or server to peer. It was formulated for browser-to-browser communications and is supported by all major desktop browsers on Android, iOS, Chrome OS, Firefox OS, Tizen 3.0, and BlackBerry 10, so WebRTC-based low-latency solutions should run without downloads on any of these platforms.

WebRTC is typically the engine for an integrated package that includes the encoder, player, and delivery infrastructure. Examples of WebRTC-based solutions are Real-Time from Phenix, Limelight Realtime Streaming, and Millicast from CoSMo Software and Influxis (Figure 1). You can also access WebRTC technology to build your own solution in tools like the Wowza Streaming Engine, CoSMo Software, and Red5 Pro Server. Latency times for technologies in this class include a global guarantee for delivery to all viewers in under 500 milliseconds (Phenix), delivery in under 500 milliseconds (Red5 Pro), and under 1 second (Limelight Networks). Time to First Frame (TTFF) is also an important metric where technologies such as Phenix deliver TTFF of < 500ms for 71% of viewers and < 816ms second for 90% of viewers. If you need sub-2-second latency, WebRTC is an option you should consider.

WebRTC Low Latency Influxis

Figure 1. System overview of the Millicast WebRTC-based system for large-scale live streaming with sub-second latency

There are two major standards for HTTP Adaptive Streaming—HLS and DASH—and there's a unifying container format, Common Media Application Format (CMAF). Once Apple announced its Low-Latency HLS solution, it instantly displaced several grassroots efforts and became the technology of choice for delivering low-latency streams via HLS. Although the spec is still a work in process, it's already supported by technology providers like Wowza and WMSPanel with its Nimble Streamer product.

There is a DVB standard for low-latency DASH, with a specification due to come from the DASH Industry Forum in early 2020. Pursuant to these specifications, encoder and player developers and content delivery networks have been working for several years to ensure interoperability so that all DASH/CMAF low-latency applications should hit the ground running. As an example, Harmonic and Akamai together demonstrated low-latency CMAF as far back as the 2017 NAB and IBC shows, displaying live OTT delivery with a latency under 5 seconds. Since then, Harmonic has integrated low-latency DASH functionality into its appliance-based products (Packager XOS and Electra XOS) and SaaS solutions (VOS Cluster and VOS360). Many other encoder and player vendors have done the same.

All HLS/DASH/CMAF-based low-latency systems work the same basic way, as shown in Figure 2. That is, rather than waiting until a complete segment is encoded, which typically takes between 6 and 10 seconds (top of Figure 2), the encoder creates much shorter chunks that are transferred to the CDN as soon as they are complete (bottom of Figure 2).

Harmonic Low Latency

Figure 2. HLS/CMAF/DASH low latency (From a Harmonic white paper titled "DASH CMAF LLC to Play Pivotal Role in Enabling Low Latency Video Streaming")

As an example, if your encoder was producing 6-second segments, you'd have at least 6 seconds of latency. If your encoder pushed out chunks every 200 milliseconds, however, and the player was configured to start playback immediately, latency should be much, much less. One key benefit of this schema is backward compatibility; since segments are still being created, players that are not compatible with the low-latency schema should still be able to play the segments, albeit with much longer latency. These segments are also instantly available for video-on-demand (VOD) presentations from the live stream.

WebSockets is a real-time protocol that creates and maintains a persistent connection between a server and client that either party can use to transmit data. This connection can be used to support both video delivery and other communications that are convenient if your application needs interactivity. Like WebRTC implementations, services that use WebSockets are typically offered as a service that includes a player and CDN, and you can use any encoder that can transport streams to the server via Real-Time Messaging Protocol (RTMP) or WebRTC. Examples include nanocosmos' nanoStream Cloud and the Wowza Streaming Cloud With Ultra Low Latency. Wowza claims sub-3-second latency for its solution, while nanocosmos claims around 1 second, glass to glass.

There is a fourth category of products best called "other" that use different technologies to provide low latency. This category includes THEO Technologies' High Efficiency Streaming Protocol (HESP), a proprietary HTTP adaptive streaming protocol that according to the company, delivers sub-100-millisecond latency while reducing bandwidth by about 10% as compared to other low-latency technologies. HESP uses standard encoders and CDNs but requires custom support in the packager and player (see Figure 3 on the next page). You can read more about HESP in a white paper available for download at go2sm.com/theohesp.

THEO HESP

Figure 3. THEO Technologies' High Efficiency Streaming Protocol (HESP) is a proprietary protocol compatible with most CDNs.

Build or Buy?

If you implement the technology yourself, be sure to answer all the following questions before choosing a technology. If you're selecting a service provider, use them to compare the different options.

Are you choosing a standard or a partner?—I outlined the different technologies for achieving low latency earlier in the article, but implementations vary among service providers. When choosing a service provider, it's more important to determine if it can meet your technological and business goals than which technology it implements.

Can it scale, and at what cost?—One of the advantages of HTTP-based technologies is that they scale at standard pricing using most CDNs. In contrast, most WebRTC-based and WebSocket-based technologies require a dedicated delivery infrastructure implemented by the service provider. For this reason, non-HTTP-based services can be more expensive to deliver than HLS/DASH/CMAF. When comparing service providers, ascertain the soup-to-nuts cost of the event, including ingress, transcoding, bandwidth, VOD file creation, one-time or per-event configurations, and the like.

Are there implementation-related issues?—Ask the following questions related to implementing and using the technology:

What's the latency achievable at a scale relevant to your audience size and geographic distribution?
Are there any quality limitations? Some technologies may be limited to certain maximum resolutions or H.264 profiles.
Will the packets pass through a firewall?
HTTP-based systems use HTTP protocols, which are firewall-friendly. Others use User Datagram Protocol (UDP), which may not be. If it's UDP, are there any back channels to deliver to blocked viewers?
What platforms are supported? Presumably computers and mobile devices, but what about set-top boxes, dongles, OTT devices, and smart TVs?
Does it scale? Can the system scale to meet your target viewer numbers? Is the CDN infrastructure private, and if so, can it deliver to all relevant viewers in all relevant markets? What are the anticipated costs of scaling?
What about the player? Can you use your own player, or do you have to use the system's player? If it's your own, what changes are required, and how much will that cost?
What's needed for mobile playback? Will the streams play in a browser, or is an app required? If there's an app required (or desirable), are software development kits (SDKs) available?
Which encoders can the system use? Most should use any encoder that can support RTMP connections into the cloud transcoder, but check to see if other protocols are needed.
Can the content be reused for VOD, or will re-encoding be required? This isn't a huge deal since it costs about $20 per hour of video to transcode to a reasonable encoding ladder, but it can be expensive for frequent broadcasts.
What are the redundancy options, and what are the costs? For mission-critical broadcasts, you'll want to know how to duplicate the encoding/broadcast workflow should any system component go down during the event. Is this redundancy supported, and what is the cost?

What features are available, and at what scale?—There will be a wide variety of feature offerings from the different vendors, which may or may not include the following:

Adaptive bitrate (ABR) streaming—How many streams, and are there any relevant bitrate or resolution limitations?
DVR features—Can the viewers stop and restart playback without losing any content?
Video synchronization—Can the system synchronize all viewers to the same point in the stream? Streams can drift over locations and devices, and without this capability, users on some connections may have an advantage for auctions or gambling.
Content protection—If you're a premium content producer, you may need true DRM. Others can get by with authentication or other similar techniques.
Captions—Captions are legally required for some broadcasts, but are generally beneficial for all.
Advertising or other monetization—Does the technology/service provider support this?

In general, if you're a streaming shop seeking to meet or beat broadcast times in the 5- to 6-second range, a standards-based HTTP technology is likely your best bet, since it will come closest to supporting the same feature set you're currently using, like content protection, captions, and monetization. If you have an application that requires much lower latency, you'll probably need a WebRTC-based or WebSockets-based solution or a proprietary HTTP technology. In either case, asking the previously listed questions should help you identify the technology and/or service provider that best meets your needs.