-->
Register now to save your FREE seat for Streaming Media Connect, Feb 24-26!

Audio and Video Have Improved Dramatically, So Why Don’t We Feel Present?

Article Featured Image

For the past 20 years, we have been improving digital media in easily measurable ways. Resolution has climbed steadily. Networks have grown faster. Latency has fallen. Across almost every technical metric, progress has been undeniable. And yet, for all this advancement, our digital interactions still feel oddly thin. We can see more and hear more than ever before, but rarely do we feel truly present. 

This is not due to a lack of technological progress. Advancements in video compression, audio coding, and delivery efficiency have been extraordinary, and they remain critically important today. Without them, modern streaming and communication as we know it would not be feasible. But these successes also reveal a persistent blind spot. High definition and clarity solved essential technical problems, not a fully human one. 

For many years, sharper images and clearer sound were treated as largely independent challenges, each optimized in isolation. The result often looks impressive but can still feel incomplete. Presence is not delivered by resolution or clarity alone. It emerges when depth, motion, timing, and sound work together, as they do in the physical world. 

One reason this problem has persisted is that presence resists simple measurement. Pixels, bitrates, and frame rates fit neatly into charts and benchmarks. Presence does not. It depends on how multiple signals combine in real time, and whether they behave in ways the brain instinctively recognizes. When they do not, the brain compensates - quietly filling in the gaps. Over time, that effort becomes fatigue. 

Why presence has remained elusive 

Anyone who has spent hours in video calls understands this intuitively. Faces appear crisp. Voices are clear. And yet something essential is missing. Micro-delays disrupt conversational flow. Audio arrives without a sense of space. Movement lacks weight and continuity. Each mismatch is minor on its own, but together they create emotional distance. We are connected, but not quite together. 

What’s changing is not the importance of efficiency and clarity, but our understanding of their role. Continued improvements in compression efficiency (achieving higher quality at lower bitrates and lower complexity) remain vital, particularly for applications such as immersive and volumetric video. At the same time, efficiency alone cannot create presence if the underlying signals are not coherent. 

Recent progress reflects a broader alignment across the media stack. Advances in next-generation video codecs help preserve motion and depth more faithfully under real-world constraints. Spatial audio techniques restore a sense of place, allowing sound to exist in three dimensions rather than collapsing into a single channel. These developments complement one another, rather than compete. 

Addressing presence requires treating streaming not as a delivery pipeline, but as a coordinated system. Low-latency connectivity, from today’s 5G networks through to emerging 6G architectures, is beginning to prioritise responsiveness over raw throughput. When combined with adaptive, AI-driven video encoding, streams can be adjusted dynamically, preserving the cues that human perception is most sensitive to, while shedding data that adds little to the experience. 

These shortcomings are not failures of ambition, but failures of formulation. For years, streaming has been treated primarily as a delivery challenge: how to compress more data, faster, through ever-wider pipes. Presence, however, is not delivered by volume. It’s an experience shaped by coordination, responsiveness, and coherence across systems. 

Can streaming become something we inhabit? 

The short answer is, yes. Recent breakthroughs suggest that the tools – and the coordination across networks, codecs, and devices – are now within reach. This marks a shift in how the problem is approached. Presence can finally be treated as a system-level challenge, rather than a visual upgrade. 

Full-fidelity streaming is not about overwhelming the senses. It is about restoring the subtle cues that make interactions feel natural. Spatial audio, for example, anchors voices and sounds in three dimensions, creating an immersive sound experience that adds depth and realism to music, movies, games, and calls, for a more natural, enveloping engagement. Photorealistic 3D and volumetric video introduce depth and perspective, allowing viewers to feel and move around inside a scene rather than looking at it from the outside. Adaptive, AI-driven compression ensures these experiences remain efficient and scalable, even as their complexity grows. 

For this illusion to hold, these elements must operate in unison. Presence collapses if there is lag between image and sound, or between motion and response. This is why advances in compression, AI, edge processing, and emerging network technologies matter so deeply. They make it possible to coordinate complexity, adapt streams dynamically, and prioritise what human perception actually notices. As networks evolve toward 6G, the emphasis shifts from raw speed to responsiveness. And from delivering more data to delivering the right data at the right moment. 

Redefining presence 

What emerges from this convergence is a new phase of digital media. Streaming begins to feel less like something we passively consume and more like something that welcomes our participation. The distance between colleagues, collaborators, and loved ones softens. Remote collaboration becomes less tiring and more cohesive. The emotional bandwidth of media increases. 

This transformation will not arrive overnight, and it should not be oversold. The core challenges – efficiency, clarity, and scalability – remain active areas of research. But the direction is clear. After years of pursuing clarity through pixels and throughput, we can begin to focus on coherence, sound, motion, and interaction moving together through space and time.  

If the last twenty years were about making media clearer, the next may be about making it truer. Not only by adding more definition, but by placing them within a broader understanding of presence, and engineering for the way humans actually experience being together. 

[Editor's note: This is a contributed article from NokiaStreaming Media accepts vendor bylines based solely on their value to our readers.]

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Nokia, Ericsson, Fraunhofer HHI Join Forces to Drive 6G-Era Video Coding Standardization

In this exclusive interview, Streaming Media's Jan Ozer speaks with Ville-Veikko Mattila, Head of Multimedia Technologies at Nokia, about Nokia's collaboration with Ericsson and Fraunhofer Institute on a new codec development and its implications for the next decade of video compression.

The State of the Video Codec Market 2025

HEVC rode the 4K/HDR wave to success, but AV1, VVC, and LCEVC lack an equivalent killer app. With CDNs racing to the bottom on pricing and new royalty pools threatening to increase costs, codec adoption is no longer just about technical merit—it's about survival. These are the issues I'll explore in this article.