The State of WebRTC and Low-Latency Streaming 2019

Zlatkov notes that WebRTC accomplishes NAT by using STUN and TURN servers.

“In order for WebRTC technologies to work, a request for your public-facing IP address is first made to a STUN server,” Zlatkov writes, explaining the difference between internal IP addresses (such as 10.0.0.xxx or 192.168.1.xxx) and external, web-facing IP addresses used by a cable or DSL modem. Figure 2 describes the step-by-step process to obtaining an ICE candidate in detail.

Figure 2: The step-by-step process for obtaining an ICE candidate. (Image courtesy of SessionStack)

Securing Private IP Addresses

Another issue with ICE is the potential security of private IP addresses when using a STUN or TURN server. And this is an area that’s being diligently worked on in late 2018 and early 2019 by the IETF working group for WebRTC.

“To maximize the probability of a direct peer-to-peer connection, client private IP addresses are included in [WebRTC ICE] candidate collection,” the IETF notes in an October 2018 draft document around ICE candidates. “However, disclosure of these addresses has privacy implications.”

To address these privacy implications and thwart potential direct denial of service (DDoS) attacks on specific computers, devices, or servers that are hidden behind a firewall, the IETF currently suggests an obfuscatory approach to these private IP addresses.

“This document describes a way to share local IP addresses with other clients while preserving client privacy,” the IETF draft notes. “This is achieved by obfuscating IP addresses with dynamically generated Multicast DNS (mDNS) names.” The mDNS naming scheme falls under rfc6762, which spells out the benefits of an mDNS approach.

The primary benefits of multicast DNS, as noted in the documentation for rfc6762, “are that (i) they require little or no administration or configuration to set them up, (ii) they work when no infrastructure is present, and (iii) they work during infrastructure failures.”

While mDNS is on the standards track, having been around since 2013, the much more recent draft document combining ICE and mDNS is set to expire in late April 2019.

Prior to that, however, it appears that a standard approach will emerge to consistently obfuscate private IP addresses sent to STUN and TURN servers. What is still uncertain is whether these private IP addresses will still be sent “in the clear” or will be encrypted or obfuscated before they ever reach the STUN or TURN server; so we will monitor the draft recommendation throughout 2019 to gauge the effectiveness of this obfuscatory approach for critical applications in enterprise and government entities.

Bundling and Synchronizing for Better Delivery

The WebRTC project includes the concept of tracks. A track could be a single audio stream, or a single video stream, or a stream of another type of data. Just like an audio recorder might record and synchronize multiple tracks, the WebRTC recommendation is to use bundling of tracks within a session to ease the need for multiple ICE candidates for each type of media track.

The basic implementation level of WebRTC, though, does not require a WebRTC application to be “bundle-aware” when it comes to peer-to-peer communications, so the WebRTC 1.0 proposed specification allows for gathering ICE candidates for each media track.

The specification notes, “If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport” but allows for exceptions.

First, “If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports” which means a high number of ICE candidates for each of multiple media tracks (e.g., audio or video or other data).

Second, the use of a max-bundle command allows for a limiting of ICE candidates to a particular media track if the remote end point is not bundle-aware. This allows the WebRTC user agent to sift through a smaller pool of candidates for any given track type.

One important note is that bundling does not equate to synchronization. In fact, during a mid-2018 face-to-face meeting for the WebRTC working group in Stockholm, Peter Thatcher, a WebRTC developer at Google, presented a number of use cases that require synchronization, including multi-person video and audio chats such as a Google Hangout.

One of the most interesting use cases, though, centers on content delivery networks (CDNs) and how synchronization may help with one of WebRTC’s primary shortcomings—the ability to scale up to hundreds of thousands of simultaneous users (or participants, in the case of two-way, real-time communication).

Thatcher notes that synchronization would allow WebRTC applications to send and receive data “in a way that does not compete aggressively with audio/video,” while also allowing a CDN to fine-tune delivery so that an “app can easily send large files without buffering too much and without having the throughput drop badly.”

This need for synchronization has been noted by W3C as an outstanding issue, and one proposal includes using Real-Time Text to timestamp audio and video tracks, although the ongoing question of this approach is the role that browsers would play in generating and reading this real-time text purely for synchronization purposes.

Scaling Using P2P?

Whenever there’s talk of scaling up non-HTTP delivery approaches, you can bet a three-dollar bill that there will be discussion of peer-to-peer (P2P) delivery.

Advances in peer-assisted streaming have continued over the last decade, although conversations around the terminology of P2P have tended to be a bit muted over that same time period, given the fairly significant imploding of P2P in the 2008–2009 timeframe.

Several companies have repackaged their peer-assisted delivery technologies for use with WebRTC, but almost every one of these requires some form of JavaScript code to piggyback on to the initial WebRTC session in order to offer scale.

These solutions solve the problem of one-to-many or few-to-many delivery, essentially taking a unicast and replicating it out to hundreds or thousands of viewers, but there’s still the question of many-to-many WebRTC sessions being able to scale.

While we can’t say that many-to-many delivery of low-latency streams or even WebRTC sessions will be possible with P2P delivery, one thing is certain: Within the 2019 calendar year, we expect to see significant research and development around this very question. Solving this issue is integral to the success of WebRTC for the overall unified communications industry, especially for the enterprise, but it is almost as important for general consumer video chat.

In addition to R&D around many-to-many WebRTC sessions, we also expect to see additional effort put into driving down the latency issues at a codec level. After all, while H.264 was initially rolled out as a videoconferencing codec, its more recent compression success has been in longer group of picture (GOP) structures, some as long as 2 seconds.

This longer GOP structure has been great for HTTP-based delivery in the form of fragmented MP4 options like MPEG Dynamic Adaptive Streaming via HTTP (DASH) and Apple HTTP Live Streaming (HLS), but it has been detrimental to achieving less than 1-second, end-to-end latencies for highly time-critical use cases such as wagering or surveillance.

As such, the promise of WebRTC and other low-latency approaches will invariably drive R&D spending in 2019. Whether we’ll get to scale in 2019 is a bit of a crapshoot, but watch StreamingMedia.com for updates throughout the year on this very hot topic.

[This article appears in the March 2019 issue of Streaming Media Magazine as "The State of WebRTC and Low-Latency Streaming."]

Get instant access to our 2019 Sourcebook. Register for free to download the entire issue right now!