Glass-to-Glass Report: Comparing Low-Latency Streaming Providers
In this article, I examine low-latency services provided by streaming CDNs, removing actual vendor names and making conclusions based on delivery method. I’ll start with a discussion of the basics of latency before moving on to the testing results and analysis.
At Streaming Media conferences, speakers and panelists often joke that we’re still talking about latency decades after the start of the video streaming revolution that’s been in full swing since the 1990s. Latency with streaming video, audio, and data is the measured interval between the moment a streamer sends a video and its arrival time to the receiver. As a quick example, if I send a picture to a friend on my phone at 12:00:01 PM and my friend receives the picture at 12:00:03 PM, that’s a latency of 2 seconds. The longer the delay in the transmission, the larger the latency will be. When we use the term “real-time streaming delivery,” we typically refer to latency that’s 500 ms or lower and could be within the range of 100 ms–300 ms. For the purposes of this article, I’ll use the following definitions:
- Low-latency: Delays of up to 10–12 seconds
- Ultra-low latency: Delays of up to 3 seconds
- Sub-one second latency: Delays under 1 second
- Real-time latency: Delays under 500 ms
For the vast amount of media watched online, latency is inconsequential, as most media is archived and served by request, such as VOD content. Additionally, one viewer’s start time is asynchronous to another viewer’s start time. I can start a video on YouTube, and my start time doesn’t need to dictate the start time for another viewer. There could be buffering delays from one vendor to another, but the perception of recorded content is that it took place in the past, so there’s typically little need to feel that you’re watching it in anything like “real time.”
With live content, however, perception of immediacy does matter. The urgency of that immediacy largely depends on any interactions or transactions taking place during the live event or session. If I’m deciding when to click a button to buy or sell stocks, do I want a real-time data feed that’s less than 1 second of latency, or do I want to look at delayed buy/sell data? Similarly, if you’re watching a sports event and are able to place bets on that event, you will need close-to-real-time latency and tight synchronization with other viewers. Auctions and any time-limited sales during a live event will have the same expectations of the lowest latency possible. Security and surveillance operations using video, audio, and data will also have functional and non-functional requirements for low latency.
Factors Affecting Latency
Several variables throughout any transmission process influence the delay of a real-time moment to a receiving end. With live video and audio streams, the phrase “glass-to-glass latency” is often used to describe the time delay between the moment light passes through a camera lens and the moment it appears on a viewer’s screen. The primary influencers of latency for live streams include the following:
- On-site transmission time: The time it takes for the video signal to be processed by cameras and video switchers is usually very low and measured by frame count, referencing the frame rate used by the production. For example, an SDI-to-HDMI converter will typically add one frame of latency in the video signal path.
- Video compression: Whether you use a software or hardware-based encoder, the video signal needs a minimal buffer in order to efficiently compress the uncompressed video bitrate to a lower bitrate needed for streaming. This delay can be substantial based on your specific encoder(s) and can range from 10 ms to several seconds. The keyframe interval—also known as Group of Pictures (GoP)—can add to latency as well.
- Streaming protocol to origin: The protocol used to push a live stream outbound from an encoder to a streaming CDN ingest can add latency. Typical protocols in use today include RTMP, SRT, RTSP, and WHIP (WebRTC-HTTP Ingestion Protocol). Depending on their implementation, these can affect end-to-end latency to varying degrees.
- Round Trip Time (RTT) to ingest: Regardless of the streaming protocol, the time it takes for a packet of audio/video/data to be sent to a provider’s point of presence (POP) or ingest server and return confirmation can greatly influence latency. If a live event is streamed in Seattle and the POP is in a location in Europe, the stream packets have to make many more hops to get to that origin server, thus adding more time to the latency equation.
- Adaptive transcoding: Most CDNs in the business of low latency necessarily want to handle the encoding ladder used to create multiple qualities (or bitrates) for live streaming to ensure that their player technology works predictably across a wide range of network conditions. Just as local encoders can affect latency, server-side encoding techniques can affect latency too.
- Edge caching: Every CDN has a specific approach to handling concurrent loads within and across geographic regions. Like RTT to ingest, the number of hops that stream packets have to cross to get from an edge server to the viewer’s device will affect latency.
- Streaming protocol to viewer: While legacy transports such as RTMP and RTSP still exist across broadcast and streaming workflows today, most current streaming deployments for low latency use WebRTC, WSS (WebSocket Secure), or highly optimized HTTP delivery. WebRTC is the only popular protocol that has the option to use UDP packet delivery, which can drop frames more easily than TCP-transmitted packets.
- Player implementation: Just as video encoders have an internal buffer before compression can start, video playback on devices also requires a local cache or buffer to build before decoding frames can begin. Most CDN vendors have custom player SDKs that optimize their delivery of low-latency content to viewers. Players can also greatly affect how small or large the overall latency will be.
Synchronization Across Screens
Just because a live stream has end-to-end low latency doesn’t necessarily mean that all viewers are seeing the same moment at the same time. Variations will also exist across multiple concurrent sessions due to network conditions, processing speed, and power availability/consumption on the viewer’s device.
For many applications—and particularly for online-only events in which all viewers need to be using a connected device to make transactions during the session—synchronization across multiple screens will be much more important than the overall latency. I cannot stress how crucial synchronization may be for your particular business requirements. Latency is typically associated with efficient synchronization, and you may want to intentionally add more latency to a live session to ensure that viewers are more in sync with each other. Larger buffer times, for example, can allow video player tech to stay at the same point in time consistently during a session.
Testing: The Approach
Now that we have the basics of latency and synchronization down, let’s jump into the actual testing I conducted for this article. Time and budget necessarily constrained the scope of testing, limiting the number of vendors I could test. The aim of this article isn’t to single out any particular vendor. I collected data from my own tests with selected vendors using different transport protocols to determine trends that the data supported. I chose vendors using the following transports. The “x” factor is how many vendors I tested with the protocol.
- WebRTC x2: The de facto standard for real-time streaming using standards that are supported by HTML5 and native applications, WebRTC quickly came onto the scene when Flash Player technology and RTMP playback evaporated from devices and operating systems. WebRTC can utilize both UDP and TCP streaming and is codec-agnostic by design. Codecs in wide use today across web browsers and native apps include H.264 for video and Opus for audio. Not all WebRTC vendors provide TURN (Traversal Using Relays around NAT) support by default; TURN allows viewers who are connected behind complex firewalls and/or VPNs to receive packets that would otherwise fail.
- LL-HLS x1: Low-Latency HTTP Live Streaming (LL-HLS) has been in development for several years now, but most HLS implementations have a latency around 30 seconds, as larger chunk segments reduce the number of HTTP calls between player and server. Newer variations of LL-HLS reduce segment sizes considerably, allowing much smaller latency times. LL-HLS implementations can vary dramatically.
- HESP x1: High Efficiency Stream Protocol (HESP) is a newer HTTP streaming protocol that aims to provide a better alternative to typical HTTP deployments such as LL-HLS. HTTP infrastructure is typically cheaper and easier to scale than other transports.
- WebSockets x2: Several vendors market low-latency services using WebSockets technology. This transport has been around much longer than WebRTC and can be used for a wide range of low-latency content, from real-time text chats to near-real-time video and audio transmission. While other transports like WebRTC and HLS can often utilize non-vendor specific video playback technology, vendors offering WebSockets streaming require their embedded players to work cross-platform.
All vendors selected for testing had an RTMP ingest POP available, as I wanted the outbound transport from the encoder to be consistent across test targets. All playback used embedded player technology available from each vendor. Bandwidth was throttled during testing to see how well playback performed across a variety of network conditions.
Testing: The Environment
In order to measure end-to-end (or “glass-to-glass”) latency, I assembled a number of components from my own real-world live-streaming production gear, as well as other software to control bandwidth throttling. Here’s the order of components and settings used:
- vMix switcher with SDI output: A test clip recorded at 60 fps was played on a loop within vMix on a Windows PC and externally output via a Blackmagic Design SDI PCI card to an externally powered Atomos Shogun Inferno. The program output from vMix included a burnt-in wall clock in the top right corner of the display (see Figure 1).
- Videon EdgeCaster 4K encoder: The HDMI output from the Shogun Inferno was fed to an H.264/AAC encoder using the “Lowest Latency” preset in the encoder and used with a 1-second keyframe interval. All streams used RTMP or RTMPS for transport to the provider’s ingest.
- Connected devices: All of the device screens were arranged on a desk displaying live streams in playback (shown in Figure 1). The test devices used for all sessions included the following:
- 2016 Samsung Galaxy S7 Android phone (5 GHz Wi-Fi)
- 2023 Samsung Galaxy A54 Android phone (5 GHz Wi-Fi)
- 2021 Apple iPhone 13 (5 GHz Wi-Fi)
- 2011 Windows i7 with NVIDIA GeForce RTX 3060 GPU desktop (wired Ethernet)
- 2017 Apple i7 MacBook Pro (wired Ethernet)
- 2021 HP AMD Athlon Gold laptop (wired Ethernet)
- Network control: NetBalancer software on the vMix switcher PC controlled the network conditions to emulate 4G and 5G network conditions. Windows Internet Sharing was enabled on a second network card connected to an ASUS Wi-Fi router, and NetBalancer throttled bandwidth to this network as well. NordVPN was used to connect the HP laptop to a private VPN server located in Germany.
- Camera capture: Moments in time during test procedures were captured at a high frame rate (1/500 second) for clarity of wall clock info using a Sony NEX-7 camera. Wall clock timestamps were manually reviewed and typed into a spreadsheet to measure the variations between devices and the source.
Figure 1. The array of connected devices tested with each vendor
Testing: The Procedure
After doing initial tests with each vendor, each live-stream session went through the following time-intensive testing procedure to capture latency and synchronization data. Each round took 60–90 minutes to complete, for a total of 18 rounds (three connections per vendor and six vendors):
- Output vMix feed to externally connected SDI monitor: The vMix session was only responsible for playing a looped 4K 60 fps video to a 1080p 60 fps program feed displayed on the Shogun Inferno. A vMix standard wall clock displaying fractional seconds was added as an overlay.
- Configure RTMP encoder: The same encoding profile was used by the Videon EdgeCaster 4K to push an RTMP stream to the vendor. Most vendors provided a POP ingest location that was geographically close to my location near Vancouver, Canada.
- Configure bandwidth throttling software: Three network conditions were tested for this sampling: unfiltered (NetBalancer disabled, full throughput of business-class cable modem), 4G, and 5G.
- Conduct bandwidth test: Each device was tested on Netflix’s FAST.com service to measure and confirm that throttled or unthrottled speeds were present.
- Play live streams: Each vendor’s player was embedded on a dedicated webpage, served by my own videorx.com web-hosted site.
- Capture live-stream wall clock image: Five samples were taken, each about 5 minutes apart, for each network condition. Two pictures were taken per sampling in case legibility of timestamps was blurred.
- Review captures and input data to spreadsheet: After a round was completed, I reviewed each picture and entered the timestamp into a spreadsheet.
Testing: The Results
After testing was concluded, I set up the spreadsheet to calculate the following values (Figure 2 shows a screen grab from the spreadsheet):
- Difference in timestamps between each device: Twelve columns in the spreadsheet calculated the time difference between each possible combination of devices: iOS to Android Legacy, iOS to Android, iOS to VPN PC, and so forth. These values would help us understand how well synchronization was maintained during each session.
- Difference in source timestamp to each device: Six columns calculated the latency between the source wall clock (vMix generated output to Inferno display) and each device. These values would show which transport and vendor had the best overall latency.
- Average latency across each network condition: Averages were calculated from the five samples taken per connection speed test.
- Average latency across all network conditions: Averages were calculated from all 15 samples per vendor/transport.
- Standard deviation from the mean for device synchronization: The standard deviation across all device sync differences was calculated per connection speed and overall. Lower standard deviations meant less time difference between the synchronization results, and, as such, better synchronization.
Figure 2. The spreadsheet data collection. Click the image to see it at full size.
Anyone familiar with statistics and analysis knows that more sampling means more data and more accurate results from which to draw conclusions. Initially, I started with only three samplings per connection speed, and I later upped the sampling to five snapshots to isolate outliers and have more data for calculations. Table 1 shows the culmination of averages and standard deviations across vendors. All values are shown in seconds. Green values indicate best in category, yellow indicate second best in category, and red indicate worst in category.
Normal Network STD
Combined Network STD
Normal Network Latency
Table 1. Averages and standard deviations across vendors
From the data I collected, I identified the following trends:
- Slower network speeds resulted in increased drift between devices and larger latency values.
- One WebRTC vendor with built-in TURN support held the best sync and latency overall.
- One WebSocket vendor had the most consistent sync across devices across network conditions and the lowest/best latency under the most stressed (4G) network conditions. This vendor also had the smoothest playback under 4G emulation, with far fewer stalls than transports from other vendors.
- The newest protocol, HESP, outperformed LL-HLS consistently and ranked well overall compared to the other transports tested.
- LL-HLS with the selected vendor performed the worst across all categories.
- Older devices and devices connected over VPN connections are likely to have higher latency values.
While I suspected that WebRTC vendors would do better overall, I was pleasantly surprised to see WebSockets and HESP technologies perform as well as they did. From my experience with providing solutions to my clients, WebRTC vendors typically have higher costs than other low-latency technologies. I expected WebRTC to perform better under stressed network conditions, but HESP and WebSockets providers did much better with synchronization, with most categories under 1 second among all samples collected. If you are going to explore WebRTC streaming, make sure the vendor offers TURN support in order to reach viewers behind more complex firewalls. Pricing may vary depending on the options you need from any vendor.
The datasets I gathered are far from 100% conclusive, but it’s the most data I’ve collected to date comparing low-latency technologies.
As I’ve advised in past articles, be sure to thoroughly test any new solution that you’re thinking about implementing with your video streaming pipeline. Don’t assume that demos shown on vendor’s websites take into account all of the variables that will affect your specific implementation, and don’t assume that just because a vendor is using a particular technology that it’s going to outperform all others in its class. Take the time to make an informed decision by conducting your own tests before committing to a streaming transport provided by any vendor.
You can access a list of links to the full spreadsheet data at videorx.com/low-latency/tests