The State of WebRTC and Low-Latency Streaming 2019

Latency has always been a problem for the streaming industry, just like it was a problem for the videoconferencing industry before that. How to lower latency for time-critical delivery, while allowing unidirectional streams to scale, is one of the five major challenges the industry faces in 2019.

Get instant access to our 2019 Sourcebook. Register for free to download the entire issue right now!

One potential approach, which is finally being presented for standardization after 7 years, is called WebRTC (the RTC stands for real-time communications). This article explores the state of WebRTC in late 2018 and early 2019.

What Is WebRTC?

According to WebRTC.org, the project known as WebRTC (also referred to as rtcweb by the Internet Engineering Task Force, or IETF) continues to actively integrate a standardized native RTC of voice and video in leading web browsers. Native RTC implementation means that WebRTC communications can occur within a browser without requiring a plugin architecture (see Figure 1). Standardized means that, in theory, WebRTC communications between browsers—be they native Android, Chrome, iOS, Firefox, or Opera browsers that the WebRTC.org site lists as compatible—can occur.

Figure 1: A simplified streaming media workflow using WebRTC

The reason we caveat this as “in theory” is that rtcweb is not actually a standard, but rather suggestions on how to implement the IETF draft recommendations against the open-source project known as WebRTC. Having said that, IETF has worked jointly with the World Wide Web Consortium (W3C) to maintain a consistent recommendation for the implementation of WebRTC and is also extending the original work in a number of specific areas.

The current status of WebRTC can be broken down across several components: applications, real-time audio, real-time video, security, ICE, and STUN, or TURN. Before we dive into those acronyms, though, it’s worth noting that the 19th draft of the rtcweb overview was submitted in late 2017, around the time that the 2018 Sourcebook article on the State of WebRTC was penned.

While there are typically two draft overview submissions each year, and the 19th draft submission expired on May 16, 2018, the lack of a 20th draft of “Real Time Protocols for Browser-Based Applications” in 2018 doesn’t mean that rtcweb is dead. Far from it, as the 19th draft was submitted to the Internet Engineering Standards Group (IESG) within IETF, with a recommendation that it be published as a standard.

Given the collaboration between IETF and W3C, the subsequent committee work took a few months, but in late 2018, the W3C placed a document for standardizing WebRTC (dubbed WebRTC 1.0) on its website as W3C Candidate Recommendation 27 September 2018. When the standard is finally approved, the details will appear on the W3C site at www.w3.org/TR/webrtc.

Having cleared that up, let’s take a look at advances in several key areas of WebRTC and rtcweb development.

WebRTC Applications and Devices

There’s a unique distinction between WebRTC applications and devices. The distinction lies with JavaScript, a programming language used by many modern browsers—it forms the core basis of iOS apps, for instance, along with HTML5 and Cascading Style Sheets (CSS)—that is also leveraged to control WebRTC applications.

To maintain compliance with the WebRTC recommendations noted in Draft 14 of the IETF’s recommendation for rtcweb, a WebRTC application must implement JavaScript. In IETF terms, this would be a browser-based user agent (WebRTC UA), which “conforms to both the protocol specification and the JavaScript API” as part of its interoperability compliance.

For the device category, though, there’s no need to rely on JavaScript. In IETF terms, any non-browser implementation of WebRTC could be known as a WebRTC device.

According to IETF definitions, a WebRTC non-browser is something that conforms to the protocol specification, but it does not claim to implement the JavaScript API.

The line between devices and applications blurs a bit when it comes to what the IETF calls WebRTC native applications. These kinds of native applications can be most commonly found in software for smartphones and tablets, such as the Android operating system or Apple’s iOS.

Both Android and iOS applications can take advantage of the JavaScript application programming interface (API), but also have more robust programming languages. One such language is C++, which has an API that’s useful for lower-level hardware infrastructure programming.

WebRTC Audio and Video

The second status update centers on WebRTC audio and video.

On the video front, WebRTC video has an IETF request for comments (RFC) designation of rfc7742, while audio has a separate designation of rfc7874.

Video for WebRTC is encoded from the Y’CbCr 4:2:0 color space, which is the same native color space used for most streaming premium-content distribution, from movies and episodics to webcam footage.

Video is also separated between camera-captured content and screen-captured content. A quick glance at rfc7742 shows a few interesting notes about how to handle screen-captured content, leveraging years of learning how to handle both talking heads and computer graphics in traditional, hardware-based videoconferencing.

“Because screen-sourced video can change resolution (due to, e.g., window resizing and similar operations), WebRTC-video recipients MUST be prepared to handle midstream resolution changes in a way that preserves their utility,” states a notation in an IETF document. The same document, written in 2016, also challenges compression experts to solve the mismatch between video- and screen-capture color space for optimal encoding.

“Note that the default video-scan format (Y’CbCr 4:2:0) is known to be less than optimal for the representation of screen content produced by most systems in use at the time of this document’s writing, which generally use RGB with at least 24 bits per sample.”

The IETF document adds that, in the future, for more accurate screen-capture delivery, “[i]t may be advisable to use video codecs optimized for screen content for the representation of this type of content.”

WebRTC Security

Since we’re 3 years on from 2016, it’s worth noting that not much has changed on either the audio or video fronts. In contrast, there’s been quite a bit more work around security.

In particular, there was quite a bit of movement in late 2018 around Interactive Connectivity Establishment (ICE) protocol candidates. This transparent listing of candidates may include a combination of a particular network interface card (NIC) and its IP address, available port (e.g., 8080 or 3265), and the transport protocol available on that particular NIC.

The candidate concept is similar to the idea of a voter having multiple choices in a local or general election. A major difference here, though, is that the WebRTC candidate choice is controlled by an algorithm that chooses based on particular criteria. One challenge with ICE connection candidates, though, is differentiating between multiple NICs on a single computer versus NICs on multiple computers.

“Note that a single computer may have multiple network interfaces (wireless, wired, etc.),” writes Alexander Zlatkov in a SessionStack.com blog post, “so can be assigned multiple IP addresses, one for each interface.” Zlatkov’s blog post, “How JavaScript works: WebRTC and the mechanics of peer to peer networking,” is an excellent resource for understanding not just the JavaScript aspects of a WebRTC application, but also how STUN and TURN fit into the whole picture of initiating a WebRTC session.

STUN stands for Session Traversal Utilities for NAT and is designated by IETF as rfc7675. TURN stands for Traversal Using Relays around NAT. The “NAT” stands for network address translation.

To better understand STUN and TURN, one first needs to understand the challenges that any protocol faces in traversing the firewall to the appropriate device or application. The challenge of delivering real-time communications through the firewall to the proper IP address is known as NAT traversal. NAT has been a continuing issue for corporations from the earliest days of H.323 videoconferencing when the H.264 codec was first introduced.