Save your FREE seat for Streaming Media Connect this August. Register Now!

Found in Translation: How to Stream Video in Multiple Languages

Article Featured Image

Many companies consider multilanguage video to be so valuable that they will not talk about it on the record. That’s right, for many companies, multilanguage video amounts to a trade secret.

That fact alone should cause any producer to think for a moment: How much more value would my videos have if they were immediately accessible to people who do not speak my language? The answer applies equally when considering a public audience, as it does when communicating with a diverse pool of employees or customers.

Moving pictures in multiple languages is nothing new. Movies and television have been translated and dubbed for decades. In the U.S., secondary audio tracks have been on TV since the 1980s and are part of the DVD and Blu-ray standards. It’s common to find major releases in two or more languages.

Multiple audio tracks are widely supported by major streaming platforms, CDNs, and players. Yet, the availability of multilanguage video lags. In the U.S., the iTunes Store offers a pretty good selection of new releases, and Netflix has a limited inventory. However, you’ll be hard-pressed to find anything in more than one language on Amazon Instant Video or Google Play.

What we can conclude, then, is that the most critical aspects of creating and distributing multilanguage video aren’t technical. Translation, the key element of multilanguage video, is its own specific skill. Doing it well requires expertise that most producers do not have in-house.

The good news is that a growing range of options is available. I talked with representatives from four vendors that offer translation services to match a variety of applications and budgets.

Sovee and Moravia both offer services to localize a wide variety of content—including documents and websites—as well as video. Ramp is a cloud video platform that is rolling out a translation product in the fourth quarter of 2014. Groovy Gecko specializes in live webcasts, with the option to provide live multiple language tracks.

On-Demand Video

There’s a wider array of multilanguage services available for on-demand content than live content. Working with finished productions allows for more options to balance quality and turnaround time with cost.

Many of these services are offered by vendors that started in localization of documents and websites. They also offer broad expertise that can help clients with some of the cultural and practical implications of delivering videos in languages other than your own. At the same time, the increasing accuracy and accessibility of computer-based tools allows more companies to enter the space.

Still, as it stands, there is no commercially available software that will automatically translate speech. Therefore, translation requires several steps.


The first step is creating an accurate transcript in the original language, as you would for captioning. If you already have an accurate script or transcript, most vendors can work with that.

Otherwise, speech-to-text engines have come far enough that you can choose to have a fully machine-generated transcription, have a machine-generated one that is corrected by a human, or a fully human-generated transcript. Machine transcription speeds up the turnaround time and reduces cost, but is not yet perfect.

When choosing among these options, Moravia CMO Renato Beninatto says the source language is a factor. For videos in which the source is English, the company uses a hybrid approach, with humans editing a machine transcription. He noted, however, that the accuracy of automated transcription is lower with some less-common languages.


The choice of transcription method also depends on how your video will be translated. Here you also can choose from machine-generated, machine-generated with human assistance, and fully human-generated translations.

Ramp will offer machine translation with its service. That’s why president and COO Stuart Patterson says, “If you’re going to translate that video you need to start from a human-transcribed version of the audio track.”

Automated translation can be around 90 percent accurate, as can automated transcription, Patterson says. “If you are 90 percent accurate on the transcription and 90% accurate on the translation, then you would be around 80 percent accurate in the final piece.”

Sovee also uses machine translation, which is then corrected by people. Those corrections are entered back into the company’s Smart Engine system to help improve future accuracy. Even so, Sovee founder Steve Steele says, “We don’t ever encourage people to produce a video through just raw machine translation. I don’t think that most people would be happy with that.”

Translation can be machine-generated, machine-generated with human assist, and fully human-generated. RAMP’s upcoming product will use machine translation. 


The third step is voicing the new language soundtracks. Once again, the choice is between human and machine, using a flesh-and-blood voice actor or a synthesized voice. Using a synthesized voice is both faster and less expensive than a real human, though it might not be appropriate for all projects.

“The synthesizers are powerful,” Patterson says. “It’s amazing how many languages can now be synthesized automatically.”

Sovee President Scott Gaskill says that using synthetic voice technology allows the company to quickly translate and provide the language soundtrack for “perishable videos,” such as when “you’ve got to have the video out in 24 hours and it’s no longer valid in 2 weeks.”

With synthesized voices, Steele says, “a lot of our customers are saying, ‘Hey, that’s more than adequate because I can get something done faster, and a whole lot cheaper than with human [voice] talent involved.’”

Yet using human voice talent offers a wider range of expression to match the content and tone of your video in way that computers just aren’t able to. Synthesized voices might work well for an instructional video with only one or two speakers, but things grow complex when there are more voices. Things like regional accent and dialect should also be kept in mind.

As Beninatto says, in some countries or regions, “You wouldn’t want a particular accent associated with your brand.”

In the Sovee SmartEngine, the English and Spanish (Mexico region) translations are displayed side-by-side in sequence. The boxes below each sequence allow human post-editing if needed, such as telling the SmartEngine to keep a certain word in English.

For instance, he says, “In Brazil there are five accents, and two major ones. You want to pick the appropriate one and have it be natural and consistent” across your productions. Although Portuguese is spoken in both countries, using a voice actor from Portugal might not be appropriate for a video intended for a Brazilian audience.

Another consideration that might not be obvious is that different languages require different lengths of time to express the same idea. That means a translated audio track can get out of sync with the video.

Beninatto says that even though English is commonly thought to be more compact than most other languages, “A more accurate statement is that translation always increases the length. In my experience it doesn’t matter. If I translate Portuguese into English it will grow by 20 percent, even though English is theoretically shorter.”

There are two ways to compensate: Edit the video or edit the voice script. With screencasts or videos where the speaker is not seen, it can be appropriate to add or remove frames to fit the timing of the translation. However, vendors also are able to edit videos featuring live action and on-camera speakers.

Streaming Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

The Business Case For Deploying Multi-Language Video

Since only 5.5 percent of the world speaks English as its native language, there's a big opportunity to grow an audience by moving into global markets.

Hulu Signs Deal with Univision for Spanish-language Programming

Deal will bring hundreds of hours of telenovelas, variety shows, and more to Spanish viewers.

Lost in Translation: Going from TV to PC

The challenges of presenting broadcast-quality TV on the PC can never be fully overcome, but an integrated workflow can go a long way to improving video quality.