How to Build a Resilient Live Streaming Architecture
Learn more about streaming redundancy at Streaming Media East.
Read the complete transcript of this clip:
Kiran Patel: The traditional way of getting a lot of these kinds of live streaming workflows to be, more resilient and more reliable is to duplicate those steps out, to have components to multiple pieces of infrastructure doing every step along that chain or doing groups or those steps on that chain, and then having the failover be switching. From one set of components over to another set of components. And I think what the cloud gives you an interesting way of looking at it, is it not necessarily having-- say, going from a light blue to a dark blue to a darker blue--and having failover do nothing but think of switching between entire legs, but thinking about failure within individual sets.
And then they were talking yesterday, looking at the content of the kind of microservices, and it's not quite that far, but if you look at the individual steps along that workflow and then think of failover between each individual step along that workflow, you can build something that's a lot more resilient and reliable than a kind of a traditional workflow where you're having just a kind of an A and a B leg and your failover is essentially switching between one or the other. So the core message that we'll go onto...
We have some examples I want to go through as well, but your resilience is basically built out of having redundancy and then failover between that redundancy. And that simple form of that is just to have a duplicate of what you're building, and then switching manually between the A and then the B.
I think what the cloud can enable is having a better form of that resilience, which is having something which is a cloud native architecture, which utilizes some of the autoscaling that's possible. Some of the healing that's possible with autoscaling groups when you're architecting that way and then trying to get as much of that failover to be automatic as well. So the reason this slide is labeled "Survive the chaos" is Netflix made famous within a lot of their architecture--which they've opensourced--the concept of chaos engineering, which is the idea that you're going to have a steady state, which from a live shooting point of view should be that your audiences continue to be able to watch your live output.
And regardless of any disruptions within that architecture or workflow that you've got, that if components are failing or if network connections are failing, as long as your audiences can carry on watching that live stream, you've developed an architecture which essentially is kind of chaos monkey-proof. So you know, you can have, you can be killing components within there and your audience will watch the live stream.
So that should be the goal: Not having to notice something has failed, or making sure your audience doesn't notice something has failed even if you do. And then you can concentrate on fixing it, knowing that you haven't actually impacted your actual end user. And then getting, essentially back up to that full level of resilience or redundancy with an invisible impact to your end user. And then you're hitting your targets or being up ideally 100% of the time if you're looking at a 24/7 streaming scenario. .
AWS Solutions Marketing Manager Kiran Patel walks streamers through the logic of how much redundancy to build into their live streams in this clip from Streaming Media West 2019.