How Hulu Delivers Reliable Streams at Scale
Learn more about streaming at scale at Streaming Media East 2022.
Read the complete transcript of this clip:
Nick Brookings: What does this really mean, "delivering reliable live streams at scale?" That delivery word immediately makes you think of the CDN. And of course, that'll be a focus here, but there's a lot more, especially leading up to that to make this work. And first, to level-set, you need to be able to measure what we're even talking about here. So the metrics for availability or reliability like VPF, playback failures, startup failures, too long of startup time rebuffering latency. But also things like average bit rate, because if you're delivering reliable streams, but at a level that's under the level of quality or functionality that your users expect, they're still not going to be happy. So all of this needs to come together, and scale is another really tricky part here because it's always way easier to do something for one user or a hundred users than it is for a a hundred thousand or a million or hundreds of millions of users. And as that scale increases, reliability and performance drop, and you've got to constantly be pivoting in your system and ensuring you're looking at the right things.
It's also easy to get lost in those numbers. Expectations are always rising, and a level of buffering that might have been acceptable, say, five years ago is often not acceptable now. So you have to be really careful about managing too much to the averages of your metrics.
But getting upstream a little bit, the video pipelines are very interconnected, and especially with live video, any link in the chain can cause just about any conceivable issue downstream. So you really need to be thinking very broadly about your media systems from end to end in order to have that reliable delivery at the last stage.
So this goes--at least in my world--all the way back to ingest: Where are you getting your live signals? Do you have the kind of resiliency that you need? Do you have hot switchover paths, ideally,? Are you using protocol acceleration? Are you doing everything you can? Because if you don't get one of those segments coming in, there's nothing you can do to synthesize it, or keep the stream going on the other side, other than just increase your latency and have a larger buffer, which again is going to be an impact of some sort to your users. Because if they're watching live streaming, they're probably watching sports or they're watching news and it's something timely for them.
And then there's encoding, and that can either happen ahead of ingest or after ingest or both depending on the setup. But what I'll focus on is the last stage of encoding that typically happens where you create your adaptive bitrate set.
And that's something to think carefully about: What kind of ownership do you want over that infrastructure? Bringing encoding and management of ABR in-house adds a lot of complexity, but gives you a much tighter control over your ABR stack and video features. And then it also allows you to control more tightly what codecs, packaging, encryption you're using, which are really, really important to that delivery stage. You want to have a balance of enough bit rates. You do have an ideal bit rate for every user. If they've got one megabit of throughput, do you have, say, an 800K that they can reliably stream? But then having a balance of not too many bit rates or variations and avoiding that fragmentation.
So, that kind of gets us to that delivery stage where the more fragmentation you have, the more difficult it is to then do that final delivery stage reliably. Let's say you add a new codec to the mix, and that makes a lot of sense in isolation because everyone consuming this new fancy codec--maybe AV1--is perhaps using 30% less bandwidth or throughput, they're getting a better experience. The problem is you've now fragmented and you're processing everything and extra time. You've doubled or tripled or quadrupled your storage requirements on your origin, but you've also fragmented the cache forward into the CDN so that they're less likely to find a cache hit. There's going to be more hits back to your origin. And so it's a very interconnected ecosystem. And again, I encourage people to think very far upstream, even if what you're really trying to do is optimize that delivery bit.
Live is great for cachebility because you've got a more concentrated number of users at the live edge and more likely to get a cache hit. They're all watching at the same point in time, more or less. But it's also a lot less forgiving because if you miss a segment, there's no making up for it.
Of course, work with your CDNs to tune your cache. Charlie will probably be the right one to talk more about that. But delivery is also not the end of it because you'll have different kinds of signaling from your player or a client, whether that's playlist or manifests or DRM keys and metadata, and have to think very carefully about that. In particular, how dynamic do you really need that to be? Dynamicism is really great for adding features and functionality, but it's very difficult to scale. How often is that data changing? Do you have the opportunity to pre-compute and serve something static rather than always doing database lookups or having something hit a complex service?
And then the last bit that I'll mention here, and really the last terminal link that I think of in the whole chain of making this work is the player. The player, as hopefully everyone knows in adaptive bit rate streaming, is incredibly important, because the whole concept of ABR is we've shifted the intelligence of which segment, which bit rate to play at any given time to the player, because that player is in the best possible position to make those decisions based on that particular user's device and network characteristics and so on. So ensuring that you've got really reliable playback, you've got good ABR decisioning, that you've got the right telemetry coming from your player, and then perhaps--if you're a multi-CDN shop, something that I believe in--the player can be a critical part of completing your multi-CDN strategy.
Disney Streaming's Alexandria Sealy discusses Disney's strategy for dynamic live media delivery and meeting the challenges of scale in this clip from Streaming Media West 2021.