The Algorithm Series: QUIC Ways to Stream
In the beginning, there was IP. And IP begat both a beauty (TCP) and a beast (UDP). And all was good.
In the first age of IP (internet protocol), both TCP (transmission control protocol) and UDP (user datagram protocol) found favor with IP architects everywhere, and they were declared effective as transport protocols to carry IP. But soon, TCP was given the "good" moniker, and it began to outshine UDP. Love and attention were showered on TCP for its ability to coordinate with multiple types of traffic, and it received the World Wide Web crown. Meanwhile, the brawny, pushy, and much more efficient UDP lurked in the shadows, underpinning almost all IP traffic, except for all of the widely praised web traffic via HTTP or HTTPS.
And this first age of World Wide Web traffic (the second age of IP) proceeded in harmony, in which pictures and text behaved nicely alongside basic HTML frames and coding. At some point near the end of the first age of World Wide Web traffic, however, streaming was born.
Streaming architects began searching for a more consistent and effective way to deliver video on the web—beyond progressive download, which is the video equivalent of a still image—to be able to stream single (unicast) on-demand video streams at scale. Prophets in the video delivery wilderness declared that multicast streams could solve all the at-scale issues of unicast streaming if only we"d all welcome UDP back into the fold as an equal partner to TCP on internet routers everywhere.
Yet the lure of HTTP was greater than the lure of truly efficient at-scale video streaming, so the second age of World Wide Web traffic contained even more TCP to the exclusion of UDP. Rather than streaming fixing the internet, the internet hamstrung streaming by chopping "streams" into HTTP small-file segments for at-scale delivery with significant latencies. The approach was effective: Streaming grew in popularity, and hundreds of millions of consumers were able to view on-demand content thanks to the combination of HTTP and TCP known as HTTP/2.
Yet these same consumers were frustrated with the inability to watch sports the same way as their local neighbors with cable, satellite, or even over-the-air antennas could view it. Consumers who had cut the cord and watched live sports via an OTT stream were often the last to know—sometimes by seconds, other times by minutes—about a key goal or point scored.
If only there were a way to push live sports to an OTT box or mobile device with almost zero latency. That's when the long-forgotten UDP suddenly began to feel the love. That's also when the third age of World Wide Web traffic, made possible in equal parts by the two IP offspring (TCP and UDP), took shape. And it was known as HTTP/3. And it was good.
What Is QUIC?
OK, so my brief history of streaming and web traffic might sound corny, but it"'s necessary to understand the topic of this installment of The Algorithm Series. I'm going to delve into HTTP/3 and the QUIC (Quick UDP Internet Connections) protocol that underlies it, showing both the benefits and the math behind the magic for what may be one of the biggest revolutions in streaming since the late 1990s.
QUIC originated almost a decade ago. As part of a Chromium project at Google in 2012, led by Jim Roskind, the benefits of using UDP to lower latency were laid out in a proposal known as either HTTP/2 Semantics Using the QUIC Transport Protocol or Hypertext Transfer Protocol (HTTP) Over QUIC.
Roskind, in a 2013 paper, describes a unique approach to the very problem that streaming video faces on the web: latency. "We wish to reduce latency throughout the internet, providing a more responsive set of user interactions," he says. "Over time,
bandwidth throughout the world will grow, but round trip times, governed by the speed of light, will not diminish. We need a protocol to move requests, responses, and interactions through the internet with less latency along with fewer time-consuming retransmits, and we believe that current approaches are holding us all back."
Fast-forward to 2018, and Google's QUIC (GQUIC) had received enough positive feedback in its quest to lower overall web latency that the Internet Engineering Task Force (IETF) took over maintaining and updating QUIC functionality. The goal within the IETF, which also helps maintain the HTTP standard, was to combine a newer version of QUIC with an emerging version of HTTP. Early this year, an IETF working group did just that, releasing a draft specification that merges QUIC into the third generation of HTTP, now collectively referred to as HTTP/3.
"I'd like to suggest that—after coordination with the HTTP [working group]—we rename our … HTTP document to ‘HTTP/3,"" wrote Mark Nottingham, chair of the IETF HTTP and QUIC working groups, back in late 2018. "Doing so clearly identifies it as another binding of HTTP semantics to the wire protocol, just as HTTP/2 did."
As I mentioned earlier, TCP has been used as the transport protocol for IP for a long time. In fact, since TCP stands for transmission control protocol, the term is often misused interchangeably with HTTP.
One thing TCP doesn't do very well, though—at least at sub-second latencies needed for time-sensitive data such as audio and video streaming or interactive audio- or videoconferencing—is address errors quickly enough to request that time-sensitive data be retransmitted. If you think about early webpages, in which text would render first and inline images (GIFs or JPGs) would build slowly, the added second or two for errors to be corrected in loading those inline images was acceptable to consumers. For live, low-latency video delivery, though, any delay is unacceptable and highly inefficient. In fact, for scenarios like live sports, many TCP requests to retransmit certain packets end up being a waste of time, as the video frame those packets would be a part of have long since been shown—sometimes with glitches where the errors occurred—before the retransmitted packets arrive and are promptly abandoned.
The idea behind QUIC is to build a "space congestion control" onto the brawniness of UDP that understands the parallel nature of HTTP/2"s delivery system (which is itself multiplexing to allow delivery of more content more quickly). Since TCP"s loss-recovery mechanisms can't even "see" the multiplexing of HTTP/2, there's the potential of more than just a late packet. In fact, HTTP/2 and TCP combined run the risk of stalling out in a web browser, causing a ripple effect of other packets not being delivered to the browser.
By contrast, since QUIC inherently provides native multiplexing, the impact of lost packets is nonexistent for other data streams, which means stalls wouldn't ripple across parts of a session (in TCP terms) the way they do in TCP. In fact, since UDP doesn't really have the concept of sessions, the terminology is now around connections—and the need for a connection identifier, or CID—rather than sessions.
For those of you who are technically minded, here's a good, succinct description, courtesy of Energy Sciences Network: "Think of QUIC as being similar to TCP+TLS+HTTP/2 implemented on UDP."
Where Is QUIC Implemented?
One of the downsides of newer protocols is that they have to be implemented at scale before they become a de facto standard. For instance, hypertext markup (the "HT" in "HTTP") was around long before it became associated with web browsers (see the writings of Vannevar Bush and Robert Heinlein as examples), but it wasn't until NCSA Mosaic appeared on the scene in 1993 that hypertext became accessible to the average user.
The same is true for QUIC. Since Google's Roskind created QUIC, and Google also develops the Chrome web browser and Chromium OS, it"s not surprising that GQUIC support was baked into Google web browsers several years ago and that the IETF"s QUIC support has been added in recent years.
Beyond Chrome and Chromium, though, QUIC is seeing a surge in adoption across many major web browsers. At least on desktops and laptops, all major browsers can support QUIC. Some support it outright, such as Chrome and Microsoft's Edge browser. Others, like Apple's Safari and Mozilla"s Firefox, require the most recent version of the browser and an active switching on—through either a command-line argument or enabling developer settings—for HTTP/3 to function (See Figure 1).
Figure 1. QUIC is seeing a surge in adoption across most major web browsers, as shown by this compatibility chart from caniuse.com/http3.
On the mobile device front, though, there's still very little support for HTTP/3 and QUIC outside of Google's Chrome for Android. In set-top boxes, OTT devices, and even specialized apps on mobile OSs, this limited availability may slow overall adoption.
My co-hosts and I on the SMAF monthly webcast discussed the topic in July 2021 with guests from two of the major OTT players. While not speaking in official capacities, both guests noted that the steps toward HTTP/3 support for OTT playback require not just technical decisions about which devices to support via HTTP/3—given the limitation of some smart TVs and OTT devices, a number of them may lack either processing power or firmware updates—but also a thorough look at whether adding HTTP/3 would further fragment the OTT market.
Despite concerns about fragmentation, QUIC offers a chance to add HTTP/3 and modify QUIC at the application layer in a way that TCP never could. "Because TCP is implemented at the lowest levels of machinery (operating systems, routing firmware)," Energy Sciences Network notes, "making changes to TCP is next to impossible given the amount of upgrades that would need to occur. Since QUIC is built on top of UDP, it suffers from no such limitations and can be integrated into end host applications."
This ability to put QUIC, or even HTTP/3, into an application on a mobile device holds promise as a way for streaming providers to enable the benefits of QUIC with a single toggle switch in an application. This is similar to the way consumers today can choose to toggle on or off the use of cellular data for video-based apps on many Apple iOS or Google Android devices.
QUIC's Potential Shortcomings
Two major questions, one dealing with latency and encryption and the other with basic traffic monitoring, remain around the conversion of UDP into a session- or connection-based solution.
"To minimize latency at startup, and expedite data responses to a first contact," writes Roskind in his paper, "the very first packets sent over QUIC will often include session negotiation information as well as one or more requests. QUIC is designed to speculatively assume the client has acceptable cryptographic credentials needed for at least a preliminary encryption of a request."
What happens, though, if that speculation proves incorrect? "If the server declines to accept the credentials," writes Roskind, "additional round-trip negotiations, comparable to TCP session establishment, or SSL hello negotiations, may ensue." If you're wondering if that fallback to TCP session establishment would add quite a bit more latency, the answer is absolutely yes.
On the at-scale front, the issue of traffic management emerges as a potential shortfall. In a 2020 SANS Institute white paper, Lee Decker notes that QUIC "lacks visibility via crucial information security tools such as Wireshark, Zeek, Suricata, and Snort." He goes on to explain that the reason for this lack of visibility is due in part to the use of the newer TLS version (1.3 instead of the standard 1.2) and also because of the simple fact that most security tools don't handle UDP traffic-sniffing very well.
"The defender is at a disadvantage as selective blocking of QUIC isn't always possible," writes Decker. "Moreover, some QUIC traffic may be legitimate, and so outright blocking of endpoints that use QUIC is likely to cause more issues than it solves."
So can we stream with QUIC? The short answer is yes and no. The longer answer depends on the streaming approach you're using.
Most traditional streaming protocols are HTTP-based, but that means HTTP versions 1, 1.1, and 2. As of yet, none of the HTTP solutions—including Apple's HTTP Live Streaming (HLS) and MPEG's Dynamic Adaptive Streaming over HTTP (DASH)—have been updated to work with HTTP/3 and QUIC.
On the flip side, the oldest streaming protocol, Real-Time Transport Protocol (RTP), on which some of the more modern interactive video solutions, such as WebRTC, are based, is inherently UDP-native. Add to that solutions that encompass some form of error correction, such as SRT, which are designed from the ground up using UDP.
In the middle of all of this is a protocol that's complementary to RTP: Real-Time Streaming Protocol (RTSP). While RTP and RTSP are directly related, RTSP was created with the ability to use both TCP and UDP for its IP video transport. Frequently, RTSP will combine TCP for session control and UDP for brute-force delivery of video packets. That combination has served RTSP-based audio and video streaming solutions well for more than 20 years.
Are We Close to QUIC Prime Time?
This brings up an interesting possibility about QUIC's ascension. Since QUIC's primary design stacks on top of UDP, but also has the option of falling back to TCP to establish the initial handshake, it appears to function in much the same way that RTSP does, at least from a streaming standpoint.
Does this mean that QUIC has the potential to replace RTSP as the underlying streaming protocol? Maybe. A more likely scenario is that QUIC and HTTP/3 will be modified to understand RTSP streams, perhaps to the point that RTSP streaming is converted to HTTP/3 streaming with no middleman transmuxing service or direct user intervention.
To do so, though, the IETF would need to solve another fundamental routing problem: packet segmentation or fragmentation. This isn't the kind of simple segmentation we think of in an HTTP-based streaming solution that packages segments of video. Rather, it's the fundamental building block of internet routers that have discretion to split single packets into multiple smaller packets to better route the entire packet across different congestion paths.
Roskind sums up the problem in his paper, reminding readers that UDP packets don't have the overhead of sizable headers that are used in TCP packets—a good thing because it increases the payload of audio or video data that can be carried in a single packet. But, he says that the lack of header information can actually lead to additional problems: "When a packet is fragmented at the IP level, the initial packet will continue to hold the UDP source and target port specifiers, as well as a CID, but all latter fragments will be devoid of such identifying information. The only means of reassembly will be the IP level ‘identification" field, which is only 16 bits in length. As a result, if more than (roughly) the square root of 2^16 packets are in transit at the same time, the probability of a collision (and broken reassembly) will be large."
In other words, the potential exists—even if the time-sensitive audio or video packets all arrive in a timely manner—that the reordering of these fragmented packets might add enough latency to the reassembly effort to lose the time advantage, or, worse still, erroneous reassembly could occur.
"Packets that are mis-assembled will be detected as garbled by an authentication hash," writes Roskind. "As a result, reassembly errors cannot cause protocol errors that are any worse than discarding the packets that might be fragmented."
There are some potential solutions to these problems in a just-published draft by the IETF QUIC Working Group. For instance, it says that if a data stream "cannot be completed successfully, QUIC allows the application to abruptly terminate (reset) that stream and communicate a reason."
In addition, more work is being done around caching HTTP/3 packet content and how that interacts with the PUSH_PROMISE HTTP framing that's part of the proposed HTTP/3 protocol.
I'll be keeping an eye on this topic as the year progresses. For now, though, it's worth considering what your HTTP/3 and QUIC strategies will be a year from now (mid- to late 2022), as HTTP/3 rolls out across mobile browsers over the next few months.
[Editor's note: This article first appeared in the July/August edition of Streaming Media magazine.]
The math behind the magic of protecting live and on-demand streaming video content
There are four key delivery methods for live content, and understanding the math behind them can help decision makers determine which is the best for their applications.
Multiple algorithms for encoding, transfer, and playback intersect in the end user's player. So how can you make the numbers work in your favor?
Delivering content at scale requires a number of precise, discrete, yet interconnected steps, starting with an acute awareness of server loads.