The Algorithm Series: Live Event Scaling
After the pause in sports, concerts, festivals, and other in-person gatherings for most of 2020, there is plenty of pent-up demand for large-scale events as we head into 2021. How will the companies that deliver live-streamed events meet potentially unprecedented demand? This latest installment in our Algorithm Series delves into the math and workflow decisioning used to fine-tune live video event delivery at scale.
Many live-streaming solutions focus on unicast delivery, in which there's a single connection between the video streaming server and the end user's streaming player (a "client" in networking terminology).
Before the advent of HTTP streaming delivery, in which a stream is broken down—at the cost of an additional 15–30 seconds of latency (or delay) for a live stream—into a series of segments or chunks, most unicast delivery was based around real-time transport protocol (RTP) and derivatives like Adobe's Real-Time Messaging Protocol (RTMP). The upside of RTP-based live streams was that they could be delivered with very low latency (low enough, in fact, that they're still in use for some forms of web-based video delivery, such as WebRTC), but the downside was scalability. For every end user who needed to receive a live stream, a server had to establish a direct delivery connection (known as a "state") to be able to deliver it. That meant having very expensive servers and large-capacity bandwidth that were often only available at major internet peering-point data centers.
Peer-to-Peer (P2P) Delivery
Research in the late 1990s and early 2000s focused on scaling beyond unicast live streams, through the use of either HTTP delivery or peer-assisted delivery.
Peer to peer (P2P) uses connections between clients that were either coordinated with the server or completely independent of the server, but which needed far fewer servers once the initial content was seeded out to a set of peers. Interestingly, the server-independent P2P approach didn't spring from streaming, per se, but was tangentially attached to it: Decentralized P2P BitTorrent servers were used to coordinate unauthorized file-sharing of premium downloadable media content.
While P2P fell away in popularity with the subsequent crackdown on file-sharing sites, the efficiency and cost-effective nature of this peer-assisted delivery—with fewer servers and a limited requirement for massive bandwidth throughput from a central data center—caught the attention of videocentric network architects.
Yet P2P didn't solve all of the problems of scalability. One reason was that not all peers were created equal (or even behaved as if they were equal), due, in part, to differing network topologies. In "Using Topology Information for Quality-Aware Peer-to-Peer Multilayer Video Streaming," a 2014 paper published in Wiley's International Journal of Communication Systems, a team of authors at Ghent University's Department of Information Technology takes on the problem of leveraging network topologies to deal with the quality of the streaming experience. Here's an excerpt from the introduction:
In this paper, we introduce a model to include network information when streaming a (multilayered) video in P2P frameworks. An important metric for video stream providers is the content quality perceived by end users. The optimization studied here aims at maximizing the number of users receiving a high-quality video.
The paper addresses the optimization problem seen from the stream provider's viewpoint, having access to network topology information. An exact optimization approach is presented for benchmarking purposes and a heuristic approach to cope with realistic network sizes.
Given the fact that the paper's focus is on service providers such as CDNs and ISPs, the authors make the assumption that the service provider would install peering nodes "on well-chosen locations in the network and [lease] sufficient network capacity to interconnect these peering nodes" for optimal live-streaming delivery.
Peer selection in this approach is critical to maintaining quality throughput of live-streaming content, but the authors also suggest that four node types are necessary: injectors, which offer the initial video stream to the network; tracking nodes, which coordinate the peers as they begin the download process; peering nodes, which both receive content and seed it to other peers as well as the destination (end-user clients); and forwarding nodes, which are part of the key topology for the service provider's network.
One problem in early peer-assisted delivery was the greediness of nodes when a high-bandwidth peer became available. This approach allowed a few lucky peers to download content faster, but it meant that other peers had to hunt for lower-bandwidth peers. While it may not have seemed to be such a big deal for illegal file-sharing of premium content, including theatrical movies and high-end desktop applications, applying this greediness issue to live streaming meant that, potentially, not all peers would receive equally high-quality video streams.
The greediness problem, as stated succinctly in "Server Guaranteed Cap: An Incentive Mechanism for Maximizing Streaming Quality in Heterogeneous Overlays, a paper written by researchers from Sweden's KTH Royal Institute of Technology, was that P2P networks didn't take into account network topologies and, therefore, weren't nearly as efficient as they could have been. The paper was written a few years before the Ghent University paper and uses the terminology "selfish" instead of "greedy" when referring to peer data-sharing. Yet it presents a much more daunting conclusion: "We show that peers' selfish behavior leads to an equilibrium that is suboptimal in terms of social welfare, because selfish peers are interested in forming clusters and exchanging data among themselves."
The "social welfare" concept expects that all peer nodes will act to the benefit of all other peer nodes—which can be, at least, difficult when peers span across several networks, including wireless, Wi-Fi, or even mobile cellular networks—through a form of incentives such as server capacity to assure that peers share data equally among high- and low-contributing peers.
The algorithm is fairly simple for the measurement of aggregate utility across peers:
where, as the KTH Royal Institute of Technology paper notes, pc denotes the playout probability of contributors, and pnc denotes the playout probability of non-contributors.
As for the social welfare impact, selfish nodes across heterogeneous networks affect all kinds of P2P systems. "We argue that performance degradation due to peer clustering is not restricted to push systems only," the KTH Royal Institute of Technology paper says, "but is intrinsic to uncoordinated p2p dissemination algorithms."
To optimize social welfare, the paper recommends a server guaranteed cap (SGC) that consists of two types of data delivery: P2P dissemination (where fresh content is forwarded across the peers "according to some forwarding algorithm," which we will explore later in this article) and direct delivery.
Those types request directly to the server, from nodes that might have missed the content in a traditional P2P peer-assisted approach, requiring the server to calculate some level of threshold probability (Tp) and to retain a portion of overall delivery capacity (mt) to serve as a reserve (mr) for direct dissemination. When the numbers of contributors in the overlay and the mr are known, the server calculates the SGC and threshold value of playout probability as 1 minus mr over (1−α) · N. The complete formula is as follows:
The end result of the exercise, though, is to determine which peers qualify for direct delivery. This is determined by those that have layout probabilities that are equal to or less than the threshold. In mathematical terms, it would be those that have a pi ≤ Tp.
The Ghent University paper takes into account a concept that the KTH Royal Institute of Technology one does not: network topology. In fact, the Ghent University team notes, "To the best of our knowledge, none of the P2P (multilayer) video streaming networks use topology information to optimize the load in the network." In addition, "The strategy we propose in this paper maximizes the minimum received video quality at each destination by using topology information of the actual network to orchestrate peer selection and thereby forming a future proof and robust framework for distributing (live) video streams over the Internet." The authors propose an orchestration engine to account for changes in network topology between peers, but also acknowledge the potential for a single point of failure, something that all peer-assisted solutions try to avoid.
The Ghent University paper points out another issue with P2P: The potential lack of peers (e.g., those watching the same stream at the same time) means that P2P optimization can't occur without "large numbers of end users … who watch the same video feed." And while IP multicasting is certainty a theoretical option, the authors note that "IP multicast is not a feasible solution for a scalable video streaming service because of the sparse deployment on the Internet."
One modification of the traditional network topology—the forwarding nodes—is critical for the injector and tracking nodes to work properly. This is in keeping with the findings of the KTH Royal Institute of Technology paper, since forwarding forms the basis of peer-assisted delivery.
For greater efficiency and robustness, though, the Ghent University paper proposes the mathematical formula for a ring topology, which uses a specific R forwarding node: "By optimizing objective function R, peer nodes have the incentive to transport video layers to destinations that seem to have the highest bandwidth connection, which, in principle, is exactly the same as having the destinations selecting (the injector or peer) nodes to download from [those] that have the highest bandwidth connection." Yet rather than allowing a fully unthrottled delivery, especially between high-bandwidth nodes, the paper introduces a constraint for injector peer nodes "to prevent unnecessary transport of data between nodes" that might occur without the constraint.
Readers may remember that in the Algorithm Series article on content delivery, I cover a ring approach in which content storage and retrieval are balanced on a clockwise ring. Yet the type of ring topology for P2P is less akin to CDN decisioning and more akin to the token ring topology used for early local area networks (LANs) before the advent of Ethernet. The benefit of a P2P ring topology is that each available node on the ring can still be used as a root for the traditional P2P tree topology (see Figure 1).
Figure 1. A ring network connecting R forwarding nodes. Each ring node acts as a root for a tree topology. (Used with permission from the paper "Using Topology Information for Quality-Aware Peer-to-Peer Multilayer Video Streaming")
HTTP Live Streaming (HLS)
It would stand to reason that Apple's HTTP Live Streaming (HLS) wouldn't need HLS-specific algorithms, since it uses small files—known as chunks or segments—that are delivered from HTTP servers. But the more recent arrival of low-latency HLS has led to a specific algorithm that Roger Pantos notes in his HTTP Live Streaming 2nd Edition specification, which was released on April 30, 2020. Pantos has been integral to HLS design and adoption, having authored all of the draft specifications since the advent of HLS in 2009, which are often referred to in the industry as the "Pantos spec."
This newest edition is set to replace the original RFC 8216, with the intended standard now including low-latency HLS. As part of this rfc8216bis-07 version, Pantos includes an algorithm in Appendix C to cover low-latency delivery via a CDN. "Clients SHOULD support delivery of low-latency streams through CDNs and other HTTP caches," writes Pantos. "Correctly implementing PART-HOLD-BACK, the server-recommended playback delay from live, requires that the client first obtain a reasonably up-to-date version of the Media Playlist. There are various approaches that a client may take to obtain a recent version of a Media Playlist. [One] algorithm [for example] typically requires two Playlist requests to obtain a Playlist that is within one Part Target Duration of the current Playlist. ..."
Back to RTP
As I mentioned at the beginning of the article, RTP-based streaming was the norm for most of the first 2 decades of streaming, but the prohibitive delivery cost at scale—both in terms of bandwidth and specialized media server software—was a barrier to the industry growing to compete with traditional over-the-air broadcasters. As such, while CDNs solved the issue (for a significant cost), the use of RTP fell away with the advent of P2P and HTTP-based segmentation delivery.
Yet RTP is still alive and well in the form of WebRTC, which uses standard web browsers to deliver real-time audio and video communications—some in a one-to-one unicasting model, but others in a very low-latency bidirectional approach that allows for videoconferencing or video chat. Like with classic RTP, scaling up WebRTC in a one-to-many or many-to-many approach also requires a significant number of servers (called Selective Forwarding Units, or SFUs in WebRTC speak) to address scaling.
The use of cloud-based server platforms would seem an ideal way to dynamically scale SFUs up or down to route video between endpoints. But the authors of "Media Streams Allocation and Load Patterns for a WebRTC Cloud Architecture," a paper on SFU optimization—which accounts for differing optimization approaches according to topologies—write, "In order to satisfy resource needs, enough SFU units need to be provisioned for all sessions in a communications cloud platform" regardless of whether it's a cloud-based or local SFU pool.
Using real-world data from hundreds of streams concurrently being handled by the same SFU in a cloud-based infrastructure, the authors started with the premise that server load can easily equate to the number of streams managed by each SFU (see Figure 2). From this point, they used server load to benchmark maximum allowed streams per server for each previously defined cloud server category (e.g., an EC2 instance of a particular CPU, RAM, and SSD capacity) and then sampled from each server to verify benchmarks. They write, "We tackle the problem of how hundreds of thousands [of] streams impact the overall quality of a generic media stream processing application over time, with hundreds of streams concurrently being handled by the same SFU."
Figure 2. System overview of a cloud-based WebRTC infrastructure. (Used with permission from the paper "Media Streams Allocation and Load Patterns for a WebRTC Cloud Architecture")
Using an auto-regressive (or running-averages) model against real-world WebRTC traffic, the authors calculated the server load within a data center using a linear regression equation covering the lag observations (see Figure 3).
Figure 3. Data center total load lag plot, 1-month period (Used with permission from the paper "Media Streams Allocation and Load Patterns for a WebRTC Cloud Architecture")
Yi+1 = 0.9974 ∗ Yi−1 + 2.8282
In the equation, 0.9974 is the parameters slope, while 2.8282 is the intercept.
Based on the regression slope, the authors propose an algorithm to guarantee that each SFU is serving a minimum number of streams, which offers better load balancing (see Figure 4). They also caution that the tendency of algorithms to find a sweet spot and stick with it throughout subsequent processes can actually be counterproductive. "Even though we are able to predict the total load going to a defined data center," they write, "a resource allocation algorithm can still get into problems under the restrictions that once the first stream of a session is assigned to one server all other streams of that session will be assigned to the same server."
Figure 4. Proposed algorithm to guarantee that each SFU is serving a minimum number of streams, which offers better load balancing. (Used with permission from the paper "Media Streams Allocation and Load Patterns for a WebRTC Cloud Architecture")
There's still quite a bit of research going on in live-event scaling in each of the categories we've covered: traditional RTP, P2P delivery, HTTP low-latency delivery, and WebRTC. Each shares the common goal of reducing network traffic, whether by optimizing the server (RTP and WebRTC), by offloading the streams to generic servers (HTTP), or by shrinking the overall server footprint (P2P). But what works for one approach in one geographic region may not necessarily work in another. That could be due to a variety of reasons, a few of which I explore in this issue's Streams of Thought column (see page 12). Yet the key ingredient in all of this research is how to lower overall latency, and there's been significant progress on that front this past year for each delivery approach.
In the next issue, I'll delve into the final topic of the 2020 Algorithm Series: digital rights management. After all, if we can get premium live-event content delivered to a paying audience at latencies better than traditional over-the-air broadcasts, there's going to be more of a need to protect this content during its initial "airing" via OTT to a global audience than there's ever been before.
Will 2022 be the year UDP finally shows its streaming mettle? The Quick UDP Internet Connections (QUIC) protocol might make the difference. First, though, OTT platforms need to make technical decisions about HTTP/3 that could further fragment the market.
Multiple algorithms for encoding, transfer, and playback intersect in the end user's player. So how can you make the numbers work in your favor?
Delivering content at scale requires a number of precise, discrete, yet interconnected steps, starting with an acute awareness of server loads.
The Bottleneck Bandwidth and Roundtrip Time (BBR) algorithm is now being deployed to 80% of the CDN's customers, resulting in performance improvements of up to almost 19%.