Latency Sucks! So Which Companies Are Creating a Solution?
Speelmans says the modifications are in line with industry expectations. “The streams are actually still compliant with the Pantos spec in all key points,” he says. “A standard Apple player can still play all the streams which are produced. It will however not be able to benefit from the changes.”
While he didn’t go into details, primarily because THEOPlayer’s customer Periscope is still undergoing testing, Speelmans did say that both manifests (the file needed to act as a traffic cop between various quality versions of a single video stream) and segments are still present.
“There are still segments, which could potentially come from different servers and origins,” says Speelmans. “The information on segments is still listed in the manifest, as defined in the Pantos spec.”
What’s going on under the hood? For one thing, the length of segments (aka the target duration) has been lowered, which inherently yields a lower overall lag time.
“In the past, it was already possible to reduce the target duration to 1 second,” says Speelmans. “This resulted in a latency of about 4–6 seconds, depending on the delay on the encoder side.”
The best practice from Apple for the Pantos spec recommends segments well above 1 second, suggesting 5–10 second segments are better. Yet THEOPlayer and others appear to be able to deliver two to five times the number of segments without taxing the HTTP server, thereby achieving lower latency. So what would happen if segment size moved below 1 second?
“We see a lot of customers having keyframes set at 1-second intervals,” says Speelmans. “As such, for them the impact on scalability and bandwidth usage is actually minimal. On top of that, in our tests, segment sizes of [1 second] are hitting a quite good spot here.”
Speelmans went on to say that further reduction, down to the 500 ms or 750 ms range, could actually “result in a worse bandwidth usage.” He hinted that for the L-HLS version, THEOPlayer “could actually increase segment sizes again as the impact on this is minimal for players capable of handling this protocol.”
Erstein agrees, to a point.
“Making a keyframe frequency low (0.5–1 second) can help to reduce the startup time,” he says, but adds that a “native HLS player on iOS needs to receive three chunks to start playing anyway.”
“The server buffers [live] content in a form of chunks, and therefore, the viewer inherently lags the broadcast by at least 6–10 seconds, but generally 20–30 seconds,” says Erstein. That’s how HLS and DASH technologies work, so they are not suitable for low latency by initial design, and the designers never intended these technologies to serve real-time needs.”
Akamai’s Michels says that there’s a place for 1-second segment sizes, but that it requires more than just the HTTP servers to be robust.
“We’ve actually demonstrated 3-to-4-second latency using HLS,” says Michels, “but this requires going down to 1-second segments, which again dictates the need for incredibly robust ingest and mid-tier capabilities. Otherwise quality starts to suffer.”
A representative from another company in the space, Influxis, says that its trials with lower latency were somewhat stymied by even the types of devices it delivered to.
“Our current production system generates an HLS latency of 3–5 seconds depending on a few factors, such as location and computing power,” says Matthew Wall, director of technology at Influxis. “Android devices can be even quicker (2.5 seconds) or slower (7 seconds) depending on the model and CPU.”
Other Approaches to Lowering Latency
What about the inherent limitations and unnecessary overhead required for HLS segments? Could those be replaced? After all, HLS relies on a decades-old transport technology, M2TS, which itself relies on the decades-old ATM-based packetization technique. The combination requires almost 15 percent additional packet size overhead just to properly deliver and reassemble M2TS streams into primary audio and video streams.
Speelmans says that’s a possibility, using some of the agreed-upon approaches to packetizing first proposed by Adobe and Microsoft years ago and subsequently added into the MPEG-DASH standard. One of those is called fragmented MP4, which forgoes M2TS in favor of splitting the Base Media File Format, an ISO standard, into tens of thousands of segments.
“Fragmented MP4 has been mentioned in our discussions,” says Speelmans, “but it was not something which is going to be used in the short term. There might, however, be a use in the future as well for raw audio streams, as well as other formats.”
Forgoing the multiplexing limitations of M2TS allows audio chunks and video chunks to be bound together at the time of delivery to a viewer, limiting the number of possible permutations required for alternate audio tracks and/ or camera angles from a live stream.
Influxis’ Wall says that its approach will attempt to deliver the same latency for both HLS and DASH players.
“We are now developing a system which uses new proprietary server technology and will be able to deliver 1–1.5 seconds of latency on all devices with HLS,” says Wall, and we’re also working on the same delay for DASH.”
Erstein posits a different approach, one that forgoes the idea of only using smaller HLS segment sizes in favor of an alternative approach that combines smaller segments and a bit of HTTP enhancement.
“Let’s make the chunks very small,” says Erstein, extrapolating out the issue he brought up about the inherent limitations of HLS and DASH. “A player needs at least three chunks to start playing? OK, let’s give three 1-second duration chunks. Then we should expect 3 seconds latency, correct?”
Not exactly, says Erstein, noting that Unreal Streaming Technologies took that approach with HLS 4 years ago.
“Yes, you can achieve 3 seconds latency with HLS via Unreal Media Server, but it’s not stable,” says Erstein. “The latency can grow. [The] iOS player decides to buffer sometimes.”
On a roll, Erstein ran the extrapolated technical use case out to its logical conclusion. “Let’s make very small chunks, say, 100 ms,” he says. “So consider a DASH server sending chunks of 100 ms length. However, DASH and HLS fetch every chunk with a separate HTTP request, so you would need to send an HTTP GET request for every 100 ms chunk. In other words, you would have 10 HTTP requests per second!”
As Erstein points out, just creating these HTTP requests would take another 30–100 ms. With so many HTTP requests, this approach would essentially flood the network and thereby increase lag time—not because of segment length or a required number of segments, but because an ever-growing HTTP request time means the server has to work much harder handling these HTTP requests.
“This is why the recommended chunk duration for HLS or DASH is 8–10 seconds,” says Erstein, “so that the player doesn’t need to issue an HTTP request more frequently than once in 8–10 seconds. So there you are, inherently 8–10 seconds behind the real time.”
So what is the solution?
Erstein’s approach is to aggregate HTTP requests in groups rather than as a single HTTP request for every chunk.
The cleanest way to do this, and a way that’s supported by Wowza and other streaming media engines, is to use a persistent socket connection between the player and the server. It’s an HTTP request that persists for an extended duration.
Not unlike the session initiation protocols (SIP) that are used by VoIP phones or the session approach used by legacy streaming protocols, this single HTTP connection approach, using a persistent socket connection protocol called WebSockets, still delivers HTTP content.
From the packaging and segmentation standpoint, nothing changes, but HTTP gets out of its own way and allows these smaller segments—whether they’re the 100 ms chunks that Unreal Streaming Technologies espouses, or larger chunks in the 1–2 second ranges—to flow continuously from the server to the player over that single WebSocket connection.
“The player issues a connect request only once in the beginning. Now the connection is established, and the server starts sending chunks,” says Erstein. “The player doesn’t need to connect to the server anymore.”
Wowza’s Knowlton explains the appeal of the WebSocket approach. “RTMP usage for delivery is likely to decline at an accelerating pace over the next 5 years,” he says. “WebRTC is well-positioned to pick up market share as support for it increases throughout the ecosystem. The Wowza Streaming Engine customer adoption of our scalable WebRTC functionality has been very high, rivaling the rapid adoption of HLS when we first introduced that in 2009.”
In addition to traditional media streaming protocols, Wowza Streaming Engine also has built-in WebSocket and HTTP Provider capabilities, so it’s quite possible to maintain the use of traditional RTMP and long-segment-length “classic” HLS while also experimenting with these newer approaches to persistent HTTP connectivity all within a single server environment.
On the network and content delivery front, WebRTC is still under consideration by CDNs.
“While Akamai hasn’t made any formal announcements regarding WebRTC, we do see it as important to addressing ultra-low latency requirements—those situations that require sub-1-second latency,” says Akamai’s Michels. “We feel the more traditional broadcasters and OTT services will continue to leverage HLS and DASH for delivery, with WebRTC being used to address the more specialized use cases.”
In conclusion, it appears that in 2017 there are two things we can guarantee: consistency and change. WebRTC and WebSockets clearly are building blocks for the next generation of streaming delivery, especially if low latencies are required, but both RTMP and “classic” HLS will continue to be a factor for their specific use cases.
[This article appears in the January/February 2017 issue of Streaming Media Magazine as "Latency Sucks!"]
Join us Thursday, December 7 for "Overcoming the Latency Hurdle in Delivering Streaming Video," a webinar with Limelight Networks
Streaming Video Alliance's Jason Thibeault and Limelight's Charley Thomas address the question of whether WebRTC provides a viable solution for network latency issues in this panel from Live Streaming Summit.
Streaming Media's Tim Siglin interviews DVEO Sales Engineer Nick Joseph at Streaming Media West 2017.
Delays of up to two minutes can really destroy the live sports experience. Walled garden solutions aren't working, so it's up to CDNs to provide relief.
Lower latency while increasing reliability: That's the promise of alternate transmission protocols that expand on TCP or simply replace it as the streaming transmission champ.
The report, "Create the Streaming Media Experience Users Want," focuses on time to first frame and end-to-end latency in five markets: user-generated content, online gaming, sports, news, and radio.
Wowza's Mike Talvensaari confronts the myth that low latency for large-scale streaming is always worth the expense, and discusses various applications and use cases along a continuum of latency requirements for effective delivery.
Reel Solver's Tim Siglin, Rainbow Broadband's Russ Ham, and Verizon's Daniel Sanders discuss how attacks on Net Neutrality will impact video delivery in general and latency in particular.
Reel Solver's Tim Siglin, Rainbow Broadband's Russ Ham, and Verizon's Daniel Sanders discuss how attacks on Net Neutrality would impact video delivery in general and latency in particular.
Ooyala's Paula Minardi and Level 3's Jon Alexander discuss the key issues facing live streaming and VOD providers regarding latency times, buffering, and meeting evolving viewer expectations.
Despite some much-hyped problems, 2016 was a watershed year for online video quality of service (QoS), and 2017 promises further advances.
Ensuring broadcast quality for viewers is about more than just reducing buffering. Publishers need to improve the efficiency of back-end operations, as well.
The upcoming third edition of DASH will address several missing features, says a Comcast principal architect, and will drive down live video latency.
Now that the major codecs can deliver quality that is acceptable—or better—for most video, the next challenge to be overcome for streaming is latency. New products from HaiVision and W&W Communications aim to bring latency down to the point where streaming becomes tenable for high-speed surveillance apps
Companies and Suppliers Mentioned