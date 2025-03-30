Tech Challenges of AI-Driven Streaming Content Localization and Captioning
The buzz around AI for subbing and dubbing and localizing streaming content is that it makes it far easier than it’s ever been before. But that doesn’t mean it’s without significant technical challenges–particularly for companies like Interra Systems who develop the enabling tech–as Interra Engineering Manager Sana Asfar explains as she enumerates the key challenges and how to overcome them in this conversation with IntelliVid’s Steve Vonder Haar at Streaming Media Connect 2025.
Some key captioning issues and how AI helps to solve them
Vonder Haar asks Asfar, “What are some of the issues that you see emerging when you're trying to develop captions for a specific audience or market segment?”
Asfar says that some of the key issues include frame rate changes that desynchronize captions, varying reading speeds for different age groups, and regional spelling differences. She says that AI plays a crucial role in addressing these challenges by ensuring captions remain in sync, adjusting reading speeds, and intelligently placing line breaks.
“Frame rate change is one major culprit that comes into the picture and changes the alignment of captions with audio,” she says. “Video must undergo frame rate changes [for] multiple reasons. For example, due to some specific requirements of streaming companies, like Netflix [or] Prime, or maybe we are getting sourcing content from different places, the content [has a different] frame rate. If the video undergoes a frame rate change, a drift is introduced."
She delves into how AI can help to resolve these issues.
"AI can also play a big role when we are targeting different audiences," she says. "So, we need to change the reading speed, [for example] if you're targeting kids so that the reading speed should be slower when compared to adults. For some locations, spelling needs to be changed, like British English [and] US English. Also, we sometimes see content that must undergo censoring in some geographies. Besides, we see that while repurposing, there are different screen resolutions that we need to target. We would not want captions to hog all the screen space on a small device. If you're targeting your captions for a smaller resolution for mobile devices, you will want to have 32 characters or less per line. Similarly, if we target a larger device, we may go higher, like 42 or more characters per line. But if we change the number of characters per line, the segmentation of the caption changes. We cannot put a line break randomly at any place. We need to locate where there are natural pauses in speech intelligently. For that, we can use natural language processing. These are some aspects where AI can easily fit in and do a very accurate job.”
The importance of on-premise AI solutions for privacy and security
Asfar also touches on the importance of on-premise AI solutions for privacy and security, which content providers are increasingly adopting.
“Media content is generally proprietary, and privacy is utmost here,” she says. “Content providers were [previously] uncomfortable sharing their content to cloud-based AI systems. Now, with modern AI chips that support on-device processing, we see solutions coming up with on-premise deployment, and many products support on-premise deployment. And we see that adoption of AI has also been increasing because of the safety and comfort of on-premise.”
