Save your FREE seat for Streaming Media Connect this August. Register Now!

What’s Next for AI Dubbing in the Media Industry?

Article Featured Image

AI dubbing is poised to dramatically transform the broadcast media industry as recently developed solutions are set to be widely implemented. 

Numerous dubbing startups have joined the AI gold rush, capitalizing on the demand for affordable content localization. With solutions that cut production costs by 30-50%, these companies have the potential to transform the media and creative industries. 

For now, investors and the media are struggling with the challenge of evaluating new solutions. However, in 2024 the focus is shifting to the potential costs of emerging tools and their impact on the media industry.  

The major players in media localization are determined to stay ahead in the AI game. London-based VSI group, which provides media localization in over 80 languages worldwide, hired Scott Rose as CTO to ensure that “VSI remains involved with the development of AI in relation to dubbing.”

The market for AI-powered dubbing is shaping up 

The year 2023 will remain in our memory as the period when the first niche leaders in AI voice translation and dubbing for content localization emerged.  

Automated dubbing with the human touch. These solutions pioneered hybrid translation, where humans finalize AI-powered dubbing. Startups making waves in this segment include Papercup and Deepdub (raised $20 million each in 2022). Their end-to-end translation service is targeted at the media industry and guarantees the quality required by major broadcasters.  

AI-powered DIY translation tools allow users, such as freelance content creators and small businesses, to translate their videos with AI and then make edits on their own. Such solutions, provided by Heygen, rely on natural-sounding speech synthesis and text-to-speech software developed by Eleven Labs. 

Real-time translation solutions are being developed by Zoom. Due to some technical limitations, we may have to wait a while for a breakthrough.  

Voice challenges: Emotions and lip-syncing

Until recently, AI performed well for localizing “factual content”, such as documentaries and educational programs with less emotional variability, fewer voices, and where perfect lip sync is not crucial.

This is because historically AI voices were designed for car navigation assistance or robot vacuum cleaners, and had one or two emotions. Despite being indistinguishable from human voices, they sounded too perfect and flat. 

In 2024, as technology improves, AI will be increasingly applied to movies, animation, series, and other content. Synthesized voices have already become more emotionally expressive, sounding increasingly human. 

Voice cloning has made this breakthrough possible. Now, a single AI model can support a wide range of emotions and intonation. Voice cloning also helped facilitate the creation of ad-hoc voices, which was previously unavailable in AI-powered dubbing solutions. 

However, the industry now faces the responsibility of establishing regulations and ensuring an ethical use of human voice. Last year was marked by scandals involving cloning voices of actors who signed away their rights decades ago, well before AI came into picture. 

On a brighter side, technological advancements in voice cloning could contribute to improving voice acting, for example, by correcting pronunciation.

Another important improvement is visual lip syncing, where new prototypes emerged in 2023. They were mainly designed for simple content and had errors that required human correction. Now, there are fewer technological constraints, and next year the solutions will develop further.

How will AI-powered dubbing transform the human job market?

With more AI tools available, many industries and career paths are facing a radical transformation. Here are some of the new opportunities related to AI dubbing that could emerge next year: 

AI Dubbing Manager, or proof listener, will be fine-tuning AI dubbing systems tailored to specific industries or types of content. This role could include listening to the automatic voice overs to grasp cultural nuances, refine voice modulation, and make corrections. Some actors and interpreters may transition into this profession as it evolves.

Creative Directors for AI-enhanced productions will be guiding creative content developed through AI dubbing. They will ensure that the technology enhances rather than diminishes the overall artistic and emotional impact.

Actors licensing their AI-generated voices will charge fees for each usage. More tools will enter the market, enabling individuals to generate their voices with AI. Actors will be able to create new voices based on their own.  

Entering the last frontier for AI dubbing

This year will be marked by heated regulatory discussions, with the U.S. working on a new executive order and the EU preparing its AI rulebook. The impact of these regulatory disparities on the global tech market remains to be seen. 

The number of AI video translation and dubbing services will continue growing in 2024 as it becomes easier to launch new platforms and services. Most importantly, the last frontier — capturing emotion and modulation distinguishing human voices from AI-generated ones — will finally be crossed.

[Editor's note: This is a contributed article from Dubformer. Streaming Media accepts vendor bylines based solely on their value to our readers.]

Streaming Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Navigating the Future of AI for Media & Entertainment: Self-disrupt or Self-destruct?

Media and entertainment companies of all sizes struggle to attract and retain audiences eager to find and consume content that interests them. Because content is abundant with too many options to navigate and pay for—and switching costs are low—profitability has eluded many providers. How can media and entertainment companies respond? Many are turning to a new ally: AI.

The Video Captioning Conundrum

James Broberg of StreamShark discusses why video captioning is essential for accessibility to video content and increasing viewer comprehension.

A Machine-Learning-Based Approach to Automatic Caption Alignment for Video Streaming

To ensure a high-quality viewing experience—while maintaining compliance with regional regulations—it's imperative that audio and captions are in alignment. This can be achieved efficiently and cost-effectively with an auto-alignment system that utilizes machine learning. The result is a viewing experience that meets the high expectations of today's global audiences and drives growth.