May 14, 2019
By Tim Siglin Contributing Editor, Steve Nathans-Kelly
Featured Articles

SME '19: Red5 Pro’s Chris Allen Talks WebRTC and Ultra-Low- Latency Streaming at Scale

Learn more about WebRTC at Streaming Media's next event.

Read the complete transcript of this interview:

Tim Siglin: Welcome to Streaming Media East, 2019. I'm Tim Siglin, contributing editor with Streaming Media Magazine and the founding executive director of the not-for-profit, Help Me! Stream. This is the very first live stream we're doing from the show, and today I've got with me Chris Allen, CEO and technical co-founder at Red5 Pro. Tell me a little bit about what you do at Red5 Pro.

Chris Allen: We enable developers to build live-streaming experiences that, you know, do one-to-many and really, really low latency, under 500 milliseconds at huge scale.

Tim Siglin: Nice. And what is, define huge scale because, you know, in the RTMP days, huge scale was, well in the early days it was like a couple hundred, and you had to figure out. What for you guys is what a large scale--

Chris Allen: So, millions of concurrent viewers at the same stream.

Tim Siglin: And what's the, sort of, secret sauce with Red5 to do that? Because obviously we've got a number of plays where you got P2P, you've got peer-assisted, you've got large CDNs that allow scale. How do you all, sort of, get to the millions of viewers simultaneously?

Chris Allen: We leverage cloud networks or CDNs who have this new edge compute capability to spin up and spin down virtual instances to add to a cluster. And then the edges in this architecture actually allow subscribers to connect to it and then the video is delivered over WebRTC. So we're leveraging UDP and really fast throughput, which is basically what WebRTC is built for, is built for video chat. So we're leveraging that to kind of find server architecture, which allows us to scale this out.

Tim Siglin: Now I'm curious from a tactical standpoint, I've heard a number of people talk about spending up instances when they do live to meet scale, and one of the things that I've heard consistently is you have to keep, you can't necessarily keep cold loops, you sort of have to keep the instances warm, so they rapidly come up. Is that something you all find as well? Or is there acceptable lag time in the spin-up of the instances.

Chris Allen: Yeah, its a great question and it's not necessarily one that's easy to solve. Um, it basically, what we do. We have this thing called the Stream Manager, which is kind of like a branch of the operations monitoring all the other nodes and the traffic and how much load is being put on them. You can set thresholds in the stream manager to like, say when you hit half the capacity on a server node, its going to spin up a new one. So you actually have access capacity. That's the usual way that we'll deal with it. We also have a scheduling APR so if you know you're going to have a celebrity live event type thing, it'll spin up the instances to be available at that particular time.

Tim Siglin: Okay. So to a certain extent it's almost like you're doing like what we would do with business rules and network switches and routers back in the old days where you'd say when a certain capacity is reached, make sure we have additional capacity.

Chris Allen: That's exactly right. And it's just using virtual instances to do that.

Tim Siglin: Okay. And do you find any, any limitations with the reach of some of these cloud based solutions that you sit on in terms of countries or geographic regions?

Chris Allen: Honestly, well, I mean this is a really interesting question too. No, because they do have pretty large reach, I mean Amazon and Google Cloud, they pretty much expand the world. That said, even when they don't, I mean we've done tests we, like, have an AWS instance of our server sitting in Australia, and we're in Boston and then doing a round trip test with it and its still getting latencies of under 500 milliseconds.

Tim Siglin: Hmm, interesting. And then how do you do, sort of, quality review experience measurement from that standpoint at scale 'cause WebRTC as you said was geared more towards video chat by directional, um and it seems like the models of real-time user measurement are more towards HLS chunking or segmentation.

Chris Allen: Sure, and where you can put a cache in and buffer the stuff and in our case we don't have a buffer. We're delivering the packets as soon as they arrive so, there's certainly a trade-off, you know, when you're in really bad network conditions like this conference, for example.

Tim Siglin: Right. Isn't it ironic? At a streaming conference, and you get network intermittency issues.

Chris Allen: That's right, it, you're going to have some, you know, issues, but that said, we've got the ability to do ABR, so we have adaptive day rates so you can generate multiple streams, and then a client is requesting a different bitrate, and this is actually built into the WebRTC protocol. Usually its a, its just telling the other browser here to like, hey, lower your bandwidth so that I can keep up. In this case our ad server is going to deliver the closest thing to what you're asking for.

Tim Siglin: Interesting. So is that a negotiation between the end user and the edge server?

Chris Allen: That's right. The other, in our architecture, the edge server is really the other peer. So if you think about it, in a peer-to-peer WebRTC terms, its making a peer connection to that edge server and then they're negotiating back and forth what it should do, and delivering the stream to that client.

Tim Siglin: So, I'm not going to name a company name but there was a solution a couple years ago that was RTMP which was the flow protocol, and if I remember correctly the way that would work is essentially would pierce the firewall of say, a corporate environment with a single stream, and then if a second unicast stream came in it would say, "Oh, there are multiple requests within this firewall, let's turn on peer-assisted or peer-to-peer on the network itself." How do you all handle models where there are multiple requests from within say, a single corporate enitity or behind a firewall?

Chris Allen: Okay, so that's a complex question, and requires a complex answer.

Tim Siglin: You're a technical guy, so I'm asking you complex questions.

Chris Allen: Actually all of this is built into the WebRTC specs. It's called ICE negotiation, and there's sub-protocols STUN and TURN, which actually do the NAT punch-through to be able to get the UDP traffic through it and in the case that it can't get through the firewall, then it uses TURN which is this relay, or a tunneling. So it'll tunnel through another server, so a kind of client/server model.

Tim Siglin: That's a good, brief explanation of STUN and TURN, and I hadn't really thought about it before, tunneling versus.... but then, once it's into the firewall, through the firewall. If you want to replicate out to multiple people within that firewall, or is it just basic...

Chris Allen: We just do straight-up edge to the client and we're not doing, like, what Peer5 or Streamroot are doing. That's often a common confusion with our product verus those guys. They're actually using peers to actually help distribute the stream. They're actually solving a different problem, which is how do you take the load off of the server infrastructure?

Tim Siglin: Which is origin?

Chris Allen: In our case, we're solving the latency problems. Second, the peer connections we're actually going to lose latency by doing that. And they're delivering chunks over the data channel, I mean, this stuff can get very deep, and very...

Tim Siglin: Yeah, unless you turn on network multicasting and then the network administrator is going to shut you down anyway.

Chris Allen: They hate that.

Tim Siglin: Awesome. What sessions are you doing in the show?

Chris Allen: So, I have a session tomorrow at 10.30, I'm talking about the real-time latency, its scale, and you know, the kind of new use cases that are emerging through that.

Tim Siglin: Chris, thank you very much for your time.

Chris Allen: Yeah, and thank you, Tim.

Tim Siglin: And we'll be right back.