The State of WebRTC and Streaming Media 2018
WebRTC has been with us for the past 6 years, and it has certainly grown in adoption and popularity during that time. But how and where, exactly, does WebRTC fit into the world of media streaming? And while we’re at it, what should we expect of it in 2018?
The focus of WebRTC and the developers using it is usually on handling remote video chat meetings, either one-on-one or group calls. That is also the primary focus of web browsers in their support of WebRTC. So how did WebRTC become so interesting in the media streaming space in the past year?
There are several contributing factors here:
- Flash is dying, and this is being accelerated by modern browsers limiting its use and availability, along with Adobe’s plan to reach Flash’s end-of-life by the end of 2020.
- MPEG-DASH and HLS implementations usually come with latency limitations. More often than not, a delay of 10 seconds or more is apparent with these streaming technologies.
- Live and interactive are becoming more and more important. We see it with Facebook Live, Microsoft Mixer, and other streaming services.
- Broadcasters are looking to share and mix their streams directly in the browser with no need to install anything.
- Growing video consumption is straining streaming servers and increasing the network costs of broadcasters.
- The introduction of H.264 as a mandatory-to-implement codec in WebRTC and its availability across all modern browsers makes WebRTC easier to use for existing streaming services.
There are a few places where you can find WebRTC in media streaming services these days, and they use WebRTC quite differently from one another. To understand how these came to be, it is important to understand how WebRTC works in the context of media streaming, as shown in Figure 1.
Figure 1. A simple schematic showing WebRTC in a media streaming environment.
The browser will use a signaling channel toward the application itself. The application makes the decision on how and where to connect the browser using WebRTC. For each use case, the application and its behavior will be very different.
WebRTC’s real-time audio and video can be used in front of a CDN or a media server, for both sending and receiving media. This will be used for low-latency streaming use cases.
Last but not least, WebRTC’s data channel is used to create ad-hoc peer-to-peer (P2P) CDN connections directly between browsers. This can reduce buffering and costs for broadcasters.
With that in mind, let’s see how this fits into five different media streaming scenarios
1. Broadcasting Without Installations
The simplest way and place for WebRTC to work in media streaming is by enabling broadcasters to share their streams without an application installed.
WebRTC is the only mechanism available today to allow a browser to share its camera and microphone. Flash is no longer an option. Plugins are losing popularity and are hard to maintain. People are left with either installing a dedicated application or using WebRTC inside the browser.
In the past, this approach required transcoding between VP8 (the video codec available in WebRTC) and H.264. Today, this is no longer necessary, since browsers support H.264 encoding in WebRTC.
Figure 2.Connecting to a CDN to deliver a live stream via HLS using WebRTC
Figure 2 illustrates how this evolved architecturally. Connecting to a CDN with a live media stream required Real-Time Messaging Protocol (RTMP) support, which translated into using an additional gateway component. Kurento, an open source WebRTC media server, was widely used for that, and recently, Wowza and Red5 Pro started offering similar capabilities of connecting WebRTC to RTMP (and both are also offering low-latency viewing). Nanocosmos took a slightly different approach, allowing broadcasters to connect via WebRTC, but streaming on the other end using HLS that is optimized for low latency.
This takes the single-broadcaster approach to the next level.
As video becomes more interactive, so do the ways of creating the video itself. Certain events, such as webinars and news-related video streams, need the ability to interview people live and broadcast the interview as a single unit to viewers. But the broadcaster and his hosts might not be located in the same room, so they need to be on a live video chat that is then mixed and streamed toward the viewers.
Figure 3. A typical setup for a multi-person video interview that is mixed and streamed to viewers
The media server in Figure 3 now isn’t only a media gateway. It receives real-time media streams from all the active participants (the broadcaster and the hosts), manages the conference call between them, and also mixes all inputs to create a single stream that can be digested by a video CDN—most likely in RTMP format.
YouNow was probably one of the first to offer such a user experience. Recently, Facebook Live as well as Instagram added support for adding a guest to a live-stream event.
In the enterprise, it is quite common to host webinars that have multiple hosts joining from different locations. In recent years, webinars have been shifting from voice toward video. This has brought with it the need to enable these types of interview scenarios. At the same time, it has increased the requirement for lower latencies for viewers, which brings us to the next use case.
3. Low-Latency Live Streaming
In a way, the first two use cases can be seen as a prelude to this one—achieving low-latency live streaming in a post-Flash world. Since WebRTC was designed and implemented from the ground up for real-time communications, it is quite capable of offering low-latency live streaming.
In the past year or two, we’ve seen companies starting to employ WebRTC for such use cases. The vendors solving this technical problem are tackling it from two different directions:
- Taking the video conferencing approach and increasing it to support larger groups via a selective forwarding unit (SFU), a type of media router.
- Starting from the existing RTMP streaming technologies and enhancing them to support WebRTC as well.
Both directions are valid, and have different technical benefits and challenges. At the end of the day, both need to tackle the issue of scale: How do you take a single stream and broadcast it to a large number of viewers? Even if you assume a single media server can broadcast to thousands of viewers, what happens when the stream in question is requested by even more viewers?
Figure 4. A single broadcaster’s stream is cascaded from one media server to another before reaching its destination.
The solution is cascading, which is quite similar to how CDNs work today. Figure 4 shows such an architecture, where a single broadcaster’s stream gets cascaded from one media server to another before reaching its destination.
This is what Beam did with an infrastructure of cascaded SFUs prior to being acquired by Microsoft and rebranded as Mixer. This is how all the gray zones of the internet that offer auctions, gambling, and porn are tackling the issue of live streaming at scale.
2018 will see more technology companies offering such solutions and products, as well as a CDN or two that will offer such commercial services.
There's a cost to being cutting-edge, and for low-latency live video streaming that involves learning WebRTC and accepting limited browser support.
A Stanford University research team has created an architecture called Salsify that might offer a better way to deliver video for real-time applications
Companies and Suppliers Mentioned