Provisioning for Live Streaming Scale in Real Time at Al Jazeera and YouTube
How do large-volume streamers anticipate, provision for, and capacity-plan for real-time traffic bursts during tentpole and breaking news events? Al Jazeera Senior Streaming Media Architect Dilip Bharadwaj and YouTube Head of Live OTT Engineering Sean McCarthy discuss real-world challenges and preparation and remediation strategies in this discussion with SVTA Subject Matter Expert Bhavesh Upadhyaya at Streaming Media Connect 2026.
Monitoring Audience Growth as News Breaks
Upadhyaya kicks off the conversation by referencing the virtual impossibility of predicting scale and demand when streaming news live as it breaks. “How do you make a determination in real time of how much you need to scale and how much your potential audience may grow?” he asks. “Do you have any tools in place?”
"For this, we use Dataminr," Bharadwaj replies. "Let's suppose, as of now, there is no critical event. All of a sudden, something happens somewhere on the globe, and we can see the number of the subscribers has increased five or ten times. On our end, we have on-prem encoders that are capable of handling this traffic."
But the challenges are different on the CDN side. As a news broadcaster, he continues, "I have encountered a couple of scenarios, like for the Ukraine war, where the traffic has increased all of a sudden. In that case, we use Conviva and Grafana as analytics tools to see how the subscriber [number] grows and how it can be multiplied by four or five or 10. And in that case, we secure the advance bandwidth to make sure that our stream and our channel is capable of scaling up without affecting the end user delivery. So we take all these precautions in advance to ensure that the stream will be seamless in case of any critical event or any important vital news that we are going to receive from the ground."
Taking a Multi-CDN Approach to Capacity Planning
Bharadwaj then directs a question to McCarthy concerning times when those same news-related spikes cause Al Jazeera to send larger-than-expected streams to their YouTube channel.
“Let's suppose we are streaming to YouTube, and all of a sudden we have noticed that the number of viewers watching has increased from 40,000 to 100,000,” he begins. “In that case, we don't have visibility on the YouTube side [as for] how they are scaling up our services for the seamless delivery for the YouTube users. How do you manage this scaling up on the YouTube side? We published the streams, the main and backup RTMP to YouTube ingest. But after that, we don't have any visibility on how the scaling is happening. And our monitoring parameters only see the channel on YouTube, and we can calculate the number of subscribers who are watching our channel concurrently, but how do you manage the scaling?"
McCarthy replies that Bharadwaj's scenario and question mostly concerns "CDN capacity-scaling. We are in a similar situation as most CDN operators and infrastructure providers, which I would suggest is not the fun position to be in, but that's capacity planning, always having headroom, and having to manage that headroom and prepare for these bursty traffic patterns. There's an art and a science to it. To be frank, there's no real secret sauce to scaling the platform other than having more capacity. Now, there is a secret sauce to getting the most density out of a particular box: serve the highest number of bits as fast as possible from a particular server. We have optimizations down to the kernel level to facilitate that, but that's more of a CDN problem. We operate around CDNs, so it becomes our problem."
Drilling down further on the delivery side, McCarthy continues, "When it comes to managing multiple CDNs--which is not something that we typically do at YouTube, [given that] it's all kind of our own monolithic platform--but putting on my Paramount hat from years in the past and just seeing how others in the industry do this, I think that the power of API-driven CDN infrastructure management is not leveraged intelligently enough. So the way that we designed a multi-CDN system in the past to account for exactly this was to have one orchestration layer, and one API that can orchestrate your Fastly, your Akamai, your CloudFront, your CacheFly, whatever it might be, or however many CDNs you have in your arsenal and have feedback loops. By default, we always had two CDNs. Not everyone agrees with that. It doesn't work for everyone's business model, but assuming it does, you always have two CDNs just for reliability and you only introduce a third when it makes sense when you're either trying to mitigate risk or you just need to. And you don't always know that."
Going on to address the role observability plays in this strategy, McCarthy says, "Having your Conviva data or your client-side metrics or some indicator--even total bandwidth across all CDNs--and measuring it against what your reserved capacity is (if you have it), should trigger an API call after a certain threshold is met to your orchestration layer to introduce a new CDN to the mix. Now, there is risk in doing that, but we would mitigate that risk by doing pre-warming of the cache, making sure all of the POPs were online and everything was propagated before introducing that, or slowly staggering it out to end users through either DNS load balancing, manifest manipulation, or content steering."
He concludes, "There are a lot of tools in the kit I think that go under-leveraged for a multi-CDN strategy to solve this bursty traffic pattern problem, but at the end of the day it's infrastructure providers" that are best equipped to address it.
Related Articles
When it comes to scaling live streams for hundreds of thousands or millions of global viewers, planning is paramount, from meeting the demands of content distribution to monetization to optimizing viewer experiences. When delivering massive sports streams for Prime and NBCU and 24/7 news programming, AWS Global Leader, Solutions Architecture Steph Lone emphasizes registration systems, security, monitoring, redundancy, and failover, while adtech expert C.J. Leonard of Mad Leo Consulting highlights tactical elements like orchestration with content partners on tech and personnel and knowing the types of ads coming through in this discussion with Reality Software's Nadine Krefetz at Streaming Media Connect 2024.
19 Sep 2024
Taking a multi-CDN approach would seem to be a no-brainer for delivering large-scale streams to global audiences and maximizing uptime in the face of bursts, unexpected regional demand, and other impediments to a smoothly delivered high-stakes stream. But DAZN's Bob Hannent says it's not always so, and a multi-CDN approach can actually introduce inefficiencies, in this discussion with CDN Alliance Chairman Mark de Jong at Streaming Media Connect 2023.
03 Jan 2024
Bulldog Digital Media CEO John Petrocelli discusses how the increased attention and interest in streaming haven't brought with them a comparable understanding of the learning process required to deliver live streams at scale effectively and reliably, and he outlines what it really takes in this clip from a panel at Streaming Media West Connect 2021.
15 Dec 2021
Verizon Media's Darren Lepke discusses how to meet the challenges of scaling live streaming architecture to TV/cable-like dimensions for effective delivery of major events in this clip from Streaming Media East Connect 2021.
23 Jul 2021