October 29, 2025
By Alexey Petrovskikh Product Manager, Streaming Platform, Gcore
Spotlights

Low-latency streaming via CDN: How to optimize LL-HLS and LL-DASH for sub-3-second latency

Above: 2.0s latency from OBS RTMP to LL-DASH

Delivering real-time streaming at scale

Live streaming is evolving, and audiences now expect near-instant playback. We’re excited to introduce our low-latency live streaming solution, leveraging LL-HLS and LL-DASH to achieve a glass-to-glass delay of just 2.0–3.0 seconds. Optimized for seamless delivery across industry-standard players such as hls.js, dash.js, and native Safari support, our CDN ensures a superior streaming experience:

LL-DASH latency: ±2.0 seconds
LL-HLS latency: ±3.0 seconds

The challenge: accelerating live video delivery

LL-HLS and LL-DASH at sub-3-second latency

Supporting LL-HLS and LL-DASH at sub-3-second latency required overcoming several key challenges:

Scalable Request Handling – Traditional caching strategies for large files don’t work with LL-HLS and LL-DASH due to frequent small file updates.
Performance Monitoring Evolution – File download speed is now defined by segment duration rather than size.
Granular Low-Latency Segment Handling – CDNs must efficiently manage chunked transfer encoding for smaller segment sizes.

Here is an example of how delivery differs in terms of average request rate and average request time on the same number of viewers when switching from regular HLS to LL-HLS:

: Increased response_time and request_rate for 1 viewer when switching from regular HLS to LL-HLS

Above: Increased response_time and request_rate for 1 viewer when switching from regular HLS to LL-HLS

Typical low-latency implementation involves two principles:

Full download from the origin, then distribution via chunked transfer. This means waiting for the full length of the DASH .mp4 segment (typically 10 seconds).
Disabled caching with partial download, where all requests are sent directly to the origin without caching.

These traditional principles don’t scale well over CDN and can’t meet low-latency demands. New approaches were needed to reduce delay while maintaining stream stability.

We previously achieved ±5-second latency, but reducing it further—especially across players like hls.js and Safari—was a major technical challenge. The goal was to achieve 2-second latency without compromising stability.

Ultimately, a traditional and the new low-latency delivery scheme looked like this:

Challenges of low latency LL-DASH and LL-HLS live streams delivery glass-to-glass

Above: Challenges of low latency LL-DASH and LL-HLS live streams delivery glass-to-glass

Versatility

The protocols LL-DASH and LL-HLS use very different delivery and playback schemes. The challenge was to make both protocols work simultaneously within a single delivery system.

A few key points allowed us to synchronize the delivery of both:

LL-DASH

DASH (Dynamic Adaptive Streaming over HTTP) is a very flexible protocol, which is both its strength and its challenge. This flexibility requires developers to consider numerous nuances when segmenting content. Here are the most basic principles of MPEG-DASH:

Single manifest.
Requests for chunks are made by timing.
Continuous download of chunks during their generation (i.e., the files don’t yet exist in their final form)

Low-latency streaming aims to play live content as close to the real-time boundary as possible. In LL-DASH, segments are transcoded and distributed simultaneously. The player doesn’t need to wait for the entire file to be recorded—it can start playback based on a specified “targetLatency “attribute.

Manifest of Chunked CMAF DASH

Above: Manifest of Chunked CMAF DASH

However, playback is only possible from a key-frame, which introduces a limitation when determining latency.

On the other hand, you know that playback is only possible from a key-frame. Thus, the key-frame becomes a vital limitation in determining the latency.

Key-frame placement strategy and how it affects latency in LL-DASH:

In LL-DASH, the more frequent the key-frames, the lower the latency

Above: In LL-DASH, the more frequent the key-frames, the lower the latency

In LL-DASH, you can balance minimal latency and delivery quality on the last mile. If key-frames are too infrequent, data updates and network issues may lead to buffering.

LL-HLS

Low Latency HLS brings a set of new conditions compared to its regular version:

Manifest blocking
Short manifests
Tiny parts (files) with instant downloading after appearance

Blocking Playlist Reloads:

This is controlled via the directive: #EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES.

It instructs the server to hold the request until the part file becomes fully available. Whereas traditional HLS returns manifests almost instantly (±30 ms), LL-HLS introduces a delay equal to the duration of the next part being prepared—often 500 ms or more.

Tiny Parts and Buffer Length:

Tiny files, or "parts," are generated more frequently than full chunks. Their duration is regulated via the #EXT-X-PART-INF:PART-TARGET directive.

The buffer length is determined by the #EXT-X-SERVER-CONTROL:PART-HOLD-BACK parameter.

This value defines how many seconds the player should buffer before starting playback. It must be at least three times the part target duration.

How Apple Low Latency HLS Works:

Latency is affected by the local buffer, the number of already generated tiny files, and again, the key-frame.

Manifest of Low Latency HLS

Above: Manifest of Low Latency HLS

Strategies for Key-Frame Placement and Their Impact on Latency in LL-HLS:

In LL-HLS the more frequent the key-frames, the lower the latency

Above: In LL-HLS the more frequent the key-frames, the lower the latency

The length of the whole segment doesn’t affect the latency, but the key-frame placement remains an essential factor.

One crucial feature of LL-HLS is byte-range requests, which can significantly reduce the number of requests, aggregating multiple small files into one larger request. Some players, like Safari, may delay playback to wait for the next key-frame, delivering the specified latency.

Unification

To unify the delivery system for both protocols, we used the similarities in their handling of key-frames and streaming.

Here’s an example of how both protocols can be combined into a single system for stream generation for LL-DASH and LL-HLS:

Unified scheme of placing key-frames and downloading streams using LL-HLS and LL-DASH protocols

Above: Unified scheme of placing key-frames and downloading streams using LL-HLS and LL-DASH protocols

This unified approach enabled us to overcome several challenges in ingest, transcode, packager, and CDN delivery systems.

The challenges

Ingest: Ingesting streams with minimal delay from the origin is a challenge when dealing with the nuances of both protocols.

Transcoding: Passing unfinished parts/files to JITP required new processing workflows.

Packager: Packing manifests and parts-files on the fly from unfinished transcoding files was a technical challenge.

CDN: The CDN had to support two different data transfer schemes simultaneously:

• Accelerated downloading of small files (instant download as they appear)
• Continuous download of unfinished long files.

Performance: Delivery via CDN

The most complex part of the project was handled by the CDN. Our engineers expanded the network to over 200+ Tbps of capacity and reduced average response time to just 30ms globally. At the same time, we tackled several development tasks:

LL-DASH

Chunked transfer: The CDN needed to support chunked delivery, which was incompatible with traditional configurations like Nginx’s proxy cache lock and buffering.
CPU optimizations: Since connections last longer than usual file downloads, CDN edges needed to handle higher CPU loads.

Chunked-Proxy Module Development:

We developed a new module from scratch, which we call “chunked-proxy”. It works by downloading bytes from the origin and immediately distributing them to end-users. When a new user connects, they instantly receive the full volume of the already accumulated cache, while continuing to receive other bytes through continuous download simultaneously with all other users.

It’s important to note that the term “Chunked transfer” is used here with caution. In the case of an HTTP/1.1 connection, it would be considered a chunked transfer response, and for HTTP/2, it would be framed differently.

Key Features of the Chunked-Proxy Module:

The chunked-proxy module is designed as a full-fledged caching module in compliance with CDN standards.
It features RAM caching for GET and HEAD requests.
It processes Cache-Control headers, along with Expires, Date, and Last-Modified.
It adds an Age header to its response. This header can be used to check the caching status:

• If the response is found in the cache, the Age header indicates how many seconds ago the response was cached.
• If the response is not found in the cache, the Age header is not added.

Chunked Cache and Delivery of LL-DASH over CDN:

The diagram logic of chunked transfer of LL-DASH via the chunked-proxy module

Above: The diagram logic of chunked transfer of LL-DASH via the chunked-proxy module.

LL-HLS

Blocked Requests on Origin:

The CDN must be capable of holding connections for low-latency manifest files (.m3u8) for extended periods while waiting for a response from the origin. This presents a challenge for tools that monitor "response_time" because the response time will now be approximately equal to the duration of the part-files.

Additionally, we need to correctly handle MISS and EXPIRED statuses, sending these to the origin only on the first request. This process is tricky because default caching mechanisms like Nginx:proxy_cache_lock are not fully suitable, as they only process a portion of the statuses. A full implementation is required.

Manifest Caching Rules:

The CDN must be configured to cache low-latency .m3u8 manifests properly. The caching should be set with a cache-control of 1 second, and it should include query parameters like _HLS_skip, _HLS_msn, and _HLS_part in the cache key.

Handling an increased number of requests:

The CDN must be prepared for an increased number of requests for small files, which includes low-latency manifests for each new part, as well as the parts-files. While the file size is small, the number of requests increases significantly. For example, a single segment may contain multiple manifests and parts-files. Consequently, logs can become 2*N times larger. Monitoring and analysis tools had to be optimized to accommodate this increased load.

Sequence diagram of LL-HLS requests for manifests and parts-files

Above: Sequence diagram of LL-HLS requests for manifests and parts-filesS

Byte-range requests can significantly reduce the number of requests made to the origin. Instead of making multiple requests for discrete file parts, one request is made for a whole segment. According to the HLS specification (Section 6.2.6), “When processing a byte range of a URI that includes one or more Partial Segments that are not yet completely sent, the server MUST refrain from transmitting any bytes belonging to a Partial Segment until all bytes of that Partial Segment can be transmitted at full speed to the client.”

This means that the transmission to and from the CDN should be organized through the Chunked Proxy, just as with LL-DASH, but with an explicit logical division into parts.

Observability

All new features have been integrated into the standard transcoding and CDN delivery pipeline. As a result, we've adapted our usual analysis and monitoring tools to handle the increased load.

Key Components of Observability:

Logs: Textual records generated by the system to capture events over time. We maintain logs for low-latency streams. Read more here.
Metrics: Quantitative measures collected over time, including request counts, request statuses, stale manifest detection, response times, etc. Read more here.
Traces: Detailed records of the flow of requests through the system, helping identify bottlenecks and understand system behavior. Read more here.

The results

Gcore has achieved exceptional latency performance with the LL-DASH protocol, delivering a latency of approximately ±2.0 seconds, and the LL-HLS protocol, with a latency of around ±3.0 seconds.

Key technical developments include:

Improved Load Management for LL-HLS and LL-DASH: The CDN infrastructure was optimized to efficiently handle the higher request volume generated by small, frequent content updates, ensuring smooth delivery for both protocols.
Enhanced Monitoring and Segment Handling: Updates in performance monitoring and chunked transfer management have significantly reduced latency, resulting in seamless playback on major video players for both LL-HLS and LL-DASH streams.

What’s next

We believe ±1–2 seconds over HTTP is achievable—and we’re already working toward it. For content providers, that means not just “low latency,” but interactive latency—enabling more immersive, real-time audience engagement.

About Gcore CDN

Delivering high-quality, low-latency live streaming at scale requires advanced infrastructure built for performance. Gcore’s CDN is designed to meet these demands with industry-leading capabilities:

Gcore Edge Network

Above: Gcore Edge Network

By combining high-speed edge processing, optimized caching strategies, and real-time monitoring, Gcore’s CDN provides a robust foundation for live streaming services, enabling content providers to deliver ultra-low latency experiences to their audiences.

Find out more about our CDN and streaming service.

This article is Sponsored Content

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

How business makes streaming faster and cheaper with CDN and HESP support

Alexey Petrovskikh explores how HESP integration helps G-Core Labs ensure a high video streaming transmission rate for e-sports and gaming, efficient scalability for e-learning and telemedicine and high quality and minimum latencies for online streams, media and TV broadcasters.

29 Oct 2021

Companies and Suppliers Mentioned

Low-latency streaming via CDN: How to optimize LL-HLS and LL-DASH for sub-3-second latency

Delivering real-time streaming at scale

The challenge: accelerating live video delivery

Versatility

LL-DASH

LL-HLS

Unification

The challenges

Performance: Delivery via CDN

LL-DASH

LL-HLS

Observability

The results

What’s next

About Gcore CDN

How business makes streaming faster and cheaper with CDN and HESP support

Best Practices: Sports and Esports Strategies That Matter Most

Best Practices: Fine Tuning the Live Stream

More

Mastering the Mechanics of Live Streaming at Scale

Engineering for Throughput: Why High-Volume CDNs Are Still Necessary in a Low-Latency World

More Web Events

AWNY: Streaming's Strategic Advantage

AI Helps InterDigital Reach Beyond VVC in Race to Develop Next-Gen Codec

Free Live Sports to Bring 45 Fast Sports Channels to Devices Powered by TiVo Across North America, Europe, and South America

Leveraging AI in OTT and CTV Content Discovery

Winning Attention Through Your Digital Experience

Gravity shift

More