Beyond TCP: Meet the Next Generation of Transport Protocols
From plain old transmission control protocol (TCP) to newly conceived protocols, the variety of methods for delivering video across the internet is a key area of interest for the entire streaming media industry. After all, what good is the best-quality capture and compression if the delivery method can't keep pace?
Back in January, in an article called “Latency Sucks!” that dealt with lowering the overall delivery time of interactive or streaming video, we touched on a few newer transmission protocol derivatives: web real-time communication (WebRTC), reliable user datagram protocol (UDP), and plain old real-time protocol (RTP). Lower latency typically equates with lower-quality compression, based on the assumption that the longer the time given to processors to compress a video image, the better the quality. In this article, we will take a deeper look at the underlying protocols and discuss which ones make sense for particular applications. While the underpinnings of over-the-top (OTT) live linear delivery rely on a tried-and-true protocol, TCP, there are actually numerous technologies available to the streaming media professional.
With an eye on lowering latency and increasing reliability, while playing nicely with the neighbors— neighboring packets and networking gear—let’s now explore alternate transmission protocols that expand on or replace TCP as the reigning streaming transmission champ.
As a quick recap from January, here’s what we know about WebRTC. Promoted by Google and other major players in the open source and open standards community, WebRTC was designed for peer-to-peer communication, primarily in small groups.
WebRTC is designed to use UDP and RTP by default, although it can be set to use a TCP-based fallback if anomalies are detected. Not surprisingly, the video codecs for WebRTC are VP8 and VP9, which Google has continued to advance while also working on the Alliance for Open Media’s AV1 codec. AVC, better known as H.264, can also be used, although the audio companion to AVC cannot: Audio is usually an open-source codec called Opus, not the more-widely used AAC.
WebRTC is able to achieve very low latency, but doesn’t normally operate well in a typical streaming environment, meaning one that is based on real-time messaging protocol (RTMP), HLS, or AVC with AAC.
Companies such as Nanocosmos GmbH, a Berlin-based company that focuses on solutions from the media server to the end user, have put forth concepts to create a scalable WebRTC live scenario. But the company acknowledges that, on the delivery side, it is missing CDN and vendor support, including any support by Apple.
Many different approaches have attempted to “repair” TCP, including some that attempt to circumvent its inherent problems by narrowing the windowing size (the period of time in which, if a TCP packet has not been received by the end-user device, that device can request that the packet be retransmitted). If the TCP window is too short, though, trouble ensues as packet transmissions get backed up or collisions between packets increase.
Using WebSockets is one approach that attempts to create a persistent state of transmission (essentially a tunnel) as a way to eliminate the buildup of TCP packet transmission errors inherent to very short TCP windowing times. Still other solutions work by lowering the segment size of an HTTP-delivered video (think HLS or MPEG-DASH) to a level that allows for faster startup times and lower overall latency.
“HLS and DASH both suffer from the restriction to require file-based segments which are pulled over HTTP requests,” says Oliver Lietz, CEO of Nanocosmos. “Due to the nature of HTTP and internet connections, the segment size cannot be reduced below 2 seconds easily without sacrificing performance and stability.”
These segments for HLS have traditionally been MPEG-2 transport stream (M2TS or just TS), which itself is based on a decades-old asynchronous transfer mode (ATM) protocol designed for sending video signals across satellite. Besides the time it takes to segment video into 22- to 10-second segments, the actual TS packaging has a relatively high number of header bits as well as interleaved audio.
These header bits were useful for reassembling content sent over a satellite in a direct 1:1 link from an earth station to a satellite to a receiving satellite dish, but they are unnecessary in a TCP environment where the network transmission protocol handles delivery sequencing.
Advances made in packaging of segments were first addressed in late 2011, with Adobe and Microsoft making the joint case for the use of fragmented MP4 files that would allow delivery of multiple permutations of video streams (e.g., camera angles) and audio streams (e.g., alternate language or commentary tracks) without requiring interleaving that slowed down HLS based on its reliance on the M2TS packaging approach.
MPEG-DASH adopted the fragmented MP4 (fMP4) approach, as did Apple in the most recent version of its iOS mobile platform. One result of this move to fMP4 is an ability to avoid altogether the restrictions of file-based segments of HLS and DASH.
For instance, Nanocosmos uses frame-based segments from an MP4 live stream, essentially allowing fMP4 to act as the “segment” to achieve ultra-low latency, with a fallback to HLS low latency for standard HLS players.
The sibling of TCP is called UDP, and it isn’t necessarily designed to play well with others.
As a very simple, low-level internet protocol, at least when compared to TCP, the UDP approach forgoes a specific handshake between sender and receiver. This helps with speed of delivery, but there is no guarantee of delivery as packets are not confirmed by the receiver.
“The market was thirsty for an open source, freely available, low-latency, UDP-like approach for streaming over the internet,” says Peter Maag, chief marketing officer of Haivision, which has jointly developed a protocol called secure reliable transport (SRT) with Wowza to blend the strengths of both UDP and TCP.
“SRT blends the resilience of TCP/IP transmission with the performance of UDP,” says Maag, “and adds in security, network-health monitoring, and simplified firewall traversal.”
The market approach for SRT is not from media server to end user, but from ingest point to media server. In this way, SRT is positioning itself to be a limited-scalability replacement for RTMP, a goal of WebRTC proponents as well.
“SRT is currently ideal for contribution and distribution of performance streams,” says Maag, adding that, along with other open-source efforts, there is a goal “to extend SRT to address broad-scale OTT delivery challenges.”
Lietz thinks that RTMP is still the best approach for ingest, given the wide number of RTMP-enabled encoders on the marketplace. He also feels that UDP “potentially reduces complexity and allows creating low-latency applications.” But he adds a caveat: “For reliable live-streaming applications, several application layers need to be added on top of UDP.”
One approach is to add a transport stream on top of UDP, in much the same way that the M2TS format multiplexed MPEG-2 into segments decades ago.
“Additionally, forward error correction (FEC) needs to be added,” says Lietz, acknowledging that these additions take UDP close to the threshold of TCP in terms of latency added to the mix.
Even more worrying, while UDP is sometimes used in multicast applications, and can use standard codecs like H.264 and AAC, there is no generally available application standard available for UDP in terms of browser support.
Haivision’s Maag says that SRT addresses those issues. In addition, since SRT has been around for quite some time, he says it has found traction for ingest among certain niches.
“SRT can be used by any developer needing low-latency video streaming for hardware or software,” says Maag, noting that SRT launched in 2013 and “is being used today by hundreds of top broadcasters and enterprises alike for performance streaming applications.”
The report, "Create the Streaming Media Experience Users Want," focuses on time to first frame and end-to-end latency in five markets: user-generated content, online gaming, sports, news, and radio.
It's one of the biggest challenges facing live video streaming today, and it's remained frustratingly stubborn. Explore the technical solutions that could finally bring latency down to the 1-second mark.
Ensuring broadcast quality for viewers is about more than just reducing buffering. Publishers need to improve the efficiency of back-end operations, as well.
Companies and Suppliers Mentioned