October 29, 2025
By Chris McCarthy Principal Architect and GM of Media and Entertainment, New Relic
Blog

No Second Chances: Why Streaming Providers Should Embrace a Unified Observability Approach to Avoid Costly Site Failures

During marquee live events like the Super Bowl, the Grammys, and the Olympics, audiences expect crystal-clear, real-time viewing without interruption. Millions of people watch simultaneously, and advertisers pay record-breaking sums to get their products in front of viewers. Behind the scenes, that creates major pressure for streaming providers. Even a brief outage can turn into an irreversible mistake that makes global headlines.

In the media and entertainment industry, high-impact outages cost an average of $2 million per hour, according to New Relic’s 2025 Observability Forecast for Media and Entertainment. Unlike on-demand streaming, where customers might attempt to reload their movie or television program, live viewers are likely to seek out another provider. In this environment, delivering flawless live experiences isn’t optional, it’s critical to business survival.

Last year, a Netflix livestream outage that lasted six hours during a popular boxing match made front-page news. Such failures underscore how quickly technical issues can fracture customer trust, especially among live audiences which can be unforgiving. As a result, providers can’t rely on patchwork fixes or dashboard-based troubleshooting. Success or failure depends on how well the tech stack is prepared for peak moments.

Four Steps For Building A Resilient Tech Stack

Livestream providers that maintain loyal customer bases are the ones that invest in resilience, observability, and redundancy long before the coin toss or opening act. Here are four technologies that media and entertainment businesses should invest in to deliver the optimal viewing experience: .

Conduct Complete Load Testing: To minimize the risk of outages, livestream providers should conduct rigorous load testing well in advance of Super Bowl-sized events. These tests should go beyond video streaming to include the full user experience — from signup and payment to account modification flows. Load test data should also be generated while an observability platform monitors end to end system performance, using the same alert profiles and configurations that will run "for real" during the event. This approach equips teams with the detailed insights they need to evaluate performance and strengthen resilience before game day.
Take a Unified Observability Approach: Observability gives IT teams the ability to understand the internal state of a complex software system by examining the data it produces from the outside. It allows engineering teams to ask any question they can think of about their system's behavior, and get the answers they need to resolve issues fast. The most impactful approach for media and entertainment companies is a unified observability approach that breaks down silos between video delivery, ad insertion, and OTT applications. This provides visibility into network performance for stakeholders across smart TVs, mobile apps, and browsers, where failures are typically felt first. The real advantage, however, is that unified observability helps teams move beyond knowing that an issue exists to understanding why it’s happening. In a non-unified setup, fragmented tools may flag that videos take 10 seconds to start, but not reveal that a configuration change or upstream service dependency caused it. Unified visibility connects those dots, enabling faster, more confident resolutions.
Enable Real-Time Telemetry: Continuous data collection through real-time telemetry is also essential for detecting issues at the root rather than merely responding to surface-level alerts. While nearly every tool claims to offer “real-time” insights, the real impact comes when that data is unified. Once telemetry from across systems is brought together under a single observability platform, its value grows exponentially, enabling machine learning to perform anomaly detection and correlation across all data sources. This unified, real-time visibility helps teams identify emerging issues sooner, surface recommended fixes, and shorten mean time to resolution.
Consider a Multi-CDN Strategy: Providers should rethink their content delivery network (CDN) strategy. A CDN is a distributed system of servers positioned to accelerate and stabilize video or online content. For live streaming, CDNs help minimize buffering by routing content through the server closest to each viewer. However, relying on a single CDN provider comes with limitations, especially in the face of traffic surges that are inevitable during major live events. Organizations should assume their primary and even secondary CDNs will fail at some point and proactively, continuously test them for failover. This approach safeguards both performance and viewer experience when it matters most.

When the World Is Watching, Preparation Is Everything

The future of resilient streaming lies in providers’ ability to correlate issues across the delivery chain automatically. For example, a backend Amazon Web Services configuration change that suddenly disrupts live playback should be flagged and correlated instantly, not discovered hours later. Observability is the foundation for assisted remediation with human-in-the-loop approval—a process that combines the speed of automated systems with the judgment of a human expert—and is key to building reliable architectures.

As automation accelerates across the industry, nearly a third of media and entertainment organizations say AI adoption is already shaping their observability strategy, according to the New Relic report. Providers that embrace this shift will resolve incidents faster, create more time for innovation, and deliver smoother experiences when the world is watching.

Live events can be unforgiving; there’s no replay button for lost trust. When it comes to live streaming, the difference between success and failure depends largely on technical readiness. Providers who invest year-round in observability, redundancy, and proactive resilience are the ones viewers will remember for the right reasons.

[Editor's note: This is a contributed article from New Relic. Streaming Media accepts vendor bylines based solely on their value to our readers.]

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

How to Maximize Observability in Remote Streaming Pipelines

Cloud and remote production workflows have changed the game for live streaming, enabling significant CapEx and OpEx reductions, making it possible to achieve the same results with smaller and more efficiently deployed and largely off-site crews and reducing hardware footprints. But do all of the gains in operational efficiency that remote and distributed production provide come at the expense of real-time observability? It definitely brought new challenges on that end, according to Telestream director of product management Ken Haren, who explores these challenges and changes in best practices for software-defined observability and live-stream diagnostics that have come with the shift to remote production in this discussion with Zixi SVP of business development Emeka Okoli and Eyevinn Technology media solution specialist and VP of sales and business development Magnus Svensson at Streaming Media Connect 2026.

01 Jun 2026

The Identity Problem Isn’t Loss, It’s Fragmentation

The real issue facing marketers today is not the disappearance of identity, but its fragmentation. Signals are multiplying across browsers, devices, apps, platforms, and environments, each behaving differently and governed by different rules. Rather than a single, predictable decline, marketers are navigating a patchwork of inconsistent identity signals that vary widely depending on where and how consumers are reached.

27 Jan 2026

Tyson-Paul: A Black Eye for Netflix and Live Sports Streaming?

If the majority of Tyson-Paul viewers tuned in to see the notoriously erratic Mike Tyson in another train wreck, they got exactly what they came to see. Except it was a different sort of train wreck: VOD champ Netflix failing spectacularly to pull off a massive live event, with widespread and widely reported buffering and outages denying tens of thousands of subscribers their guaranteed ringside seat.

17 Nov 2024

How to Deliver Resilient Streams at Scale

Guaranteeing a satisfying end user experience, whether you're delivering content live or VOD, requires resiliency, ensuring that the stream doesn't break down regardless of the scale, bursts, or other fluctuations in delivery demands. And the challenges are different for live and VOD, with live proving significantly more challenging in most instances. TAG Video Systems' Michael Demb, DAZN's Bob Hannent, and the CDN Alliance's Mark de Jong discuss the key challenges and how to address them in this clip from Streaming Media Connect 2023.

08 Jan 2024

Multi-CDN Best Practices for Optimizing Performance and Reducing Latency

Pursuing a multi-CDN strategy is critical to reliable and high-performance edge delivery when streaming at scale to disparate audiences in multiple regions. But what do large-scale streamers need to know about multi-CDN decisioning and traffic-shaping to optimize performance? Harmonic's Jeff Gilbert, Paramount's Sean McCarthy, and Live X's Corey Behnke offer last-mile best practices in this clip from their panel at Streaming Media East 2023.

04 Aug 2023

How Much Redundancy Does Your Live Stream Need?

AWS Solutions Marketing Manager Kiran Patel walks streamers through the logic of how much redundancy to build into their live streams in this clip from Streaming Media West 2019.

30 Mar 2020

No Second Chances: Why Streaming Providers Should Embrace a Unified Observability Approach to Avoid Costly Site Failures

Four Steps For Building A Resilient Tech Stack

When the World Is Watching, Preparation Is Everything

How to Maximize Observability in Remote Streaming Pipelines

The Identity Problem Isn’t Loss, It’s Fragmentation

Tyson-Paul: A Black Eye for Netflix and Live Sports Streaming?

How to Deliver Resilient Streams at Scale

Multi-CDN Best Practices for Optimizing Performance and Reducing Latency

How Much Redundancy Does Your Live Stream Need?

Best Practices: Localise It - AI Subbing and Dubbing

Best Practices: Sports and Esports Strategies That Matter Most

More

First Look: IBC Streaming Solutions

Analytics That Matter: Turning Viewer Data into Actionable Insights

More Web Events

Comcast Secures Lock on UK Commercial TV

The Next Battle in Sport Isn't Content. It's Context

Sneak Preview: The Retention Game: Managing Subscriber Acquisition and Churn

Building the Perfect Live-Event Operations Playbook

Checklist Report: Ultimate Guide to Maximizing the Value of your Content Library

More