Beyond TCP: Meet the Next Generation of Transport Protocols
Given the open-source nature of SRT, which is available on GitHub for developers to contribute under a lesser general public license (version 2), there’s already been one fork in the underlying code. While development, in terms of lines of contributed code, has dropped from a mid-2016 high of 35,000 lines to down around 1,000 lines per month, the protocol shows promise beyond just streaming.
“SRT is content-agnostic and operates at the network packet level,” explains Maag, “reconstructing the real-time behavior of the data stream at the receiving end, surviving network packet loss, out-of-order packets, jitter, and fluctuating roundtrip time while maintaining low latency.”
In other words, there could be a number of uses for SRT’s robust transport protocol, including delivery of metadata alongside traditional streams. This could also be achieved by multiplexing video and non-video streams together.
SRT has the ability to multiplex control and multiple data substreams into a single established stream.
“This makes it a lot more IT-friendly in the real world,” says Maag, “because data flow is predictable and can be easily controlled and monitored at the network infrastructure level.”
While the idea of a UDP-based delivery is appealing, even without the need for multicast delivery, there are other approaches to video that hold promise.
For instance, the team at Nanocosmos has modified HLS in a bid to lower the overall latency. The Nanocosmos H5Live protocol sits at the heart of this approach, and the company says it offers a few major benefits over plain old HLS—live low-latency playback in any browser via the ability to transmux from RTMP to HTML5, including to low-latency HLS for iOS.
H5Live is not necessarily a new protocol, but rather a new segmentation technology, with the end goal of keeping the stream formats compatible with existing HTML5 browsers and with an eye on enabling plug-in-free playback with a level of granular control over the transmission of video from the H5Live server to the end user’s browser-based playback.
“We like to say that H5Live fills the gap that Flash has left,” says Lietz, “for ultra-low-latency live playback on any HTML5 browser including Safari on iOS.”
H5Live is based on three primary modules: a live stream reader, a live stream segmenter and multiplexer, and a live stream delivery module. “The live stream reader connects to existing live streams, with H.264 Video and AAC Audio supported based on RTMP,” says Lietz, adding that H5Live can also connect to WebRTC sources based on UDP.
The modular segmenter/multiplexer differs from existing HLS or DASH segmenters, since it focuses on frame-based delivery independent from H.264 group of picture (GOP) length or file chunks. According to Lietz, the benefit is that live streams based on fragmented MP4 (fMP4) can be assembled to be fully compatible to standard HLS.
In addition, the segments can be repurposed to use media source extensions and WebSockets working on newer versions of the Chrome and Firefox, or even sent as MP4-like streams to “dumb” devices like set-top boxes that don’t inherently allow HLS or HTML5 players.
A final approach, taken by id3as, is to use multiple TCP sessions.
“Where others are looking to make UDP more like TCP,” says Dom Robinson, cofounder of id3as (and contributing editor to Streaming Media), “we’ve taken the opposite approach. TCP is one of the best protocols out there, but it lacks some of the benefits of UDP, so we’ve brought those bits over to a TCP approach.”
Robinson and partner Adrian Roe have devised a scheme that uses multiple TCP sessions that are bonded together, rather than the traditional way that Unix and other internet-centric approaches use link aggregation.
In some ways, this bonding approach is reminiscent of the way that telecom bonding works. If you remember old-school 64Kbps ISDN lines (also known as BRI lines), each of these B channels had its own 8 to 10Kpbs channeling and handshake bits. But when bonded together to create 384Kbps or even a multi-megabit connection as a primary line (PRI) there was no need for each B channel to negotiate its own handshake. So the PRI handled connectivity, freeing up those extra 8–10Kbps on each of a dozen or more lines.
The id3as team calls their approach GRIT, and they offer a more flexible bonding solution. Any of the TCP sessions or “channels” can handle the handshake and pass necessary information such as license keys. That leaves all the other TCP sessions to focus on the task of delivering its portion of the video— in a massive parallelization, but still looking like standard TCP traffic—without needing to constantly renegotiate in a way that tears down a session every time a packet isn’t properly delivered.
While the id3as concept isn’t new, having been tried in the field for the last 3 years, it has seen considerable interest of late from those that don’t want to move away from TCP to UDP.
While this article deals primarily with transmission and the world of transport protocols, that’s only the middle of the story, which begins with coding, packaging, and multiplexing.
In order to create a stable and scalable communication system, both encoder and player need to be aware of potential latency pitfalls as well as points at which to avoid these pitfalls via the use of buffering.
We tend to think of buffering as a bad thing, but with live streaming often used in unstable environments like mobile networks, there’s a need for “relief valves” at various points in the process. This need lends itself to buffer control to avoid frame dropping. Buffer control also has the added benefit of keeping latency down in all network situations, even those pesky unstable mobile networks.
In addition, there may be a need for short-term adaptive bandwidth control to “smooth out” wide variations in bandwidth at any given point in the delivery process.
The downside is that these technologies may ultimately have a greater impact on the user experience than just the underlying transport protocol.
As such, it is key that the industry solve a holistic transmission problem, with interoperability in mind, rather than just solving for one middling-sized part of the equation.
[This article appears in the July/August 2017 issue of Streaming Media Magazine as "How's Your Transmission?"]
The report, "Create the Streaming Media Experience Users Want," focuses on time to first frame and end-to-end latency in five markets: user-generated content, online gaming, sports, news, and radio.
It's one of the biggest challenges facing live video streaming today, and it's remained frustratingly stubborn. Explore the technical solutions that could finally bring latency down to the 1-second mark.
Ensuring broadcast quality for viewers is about more than just reducing buffering. Publishers need to improve the efficiency of back-end operations, as well.
Companies and Suppliers Mentioned