Producing Seamless Multi-Lingual Live Streams: A Practical Guide for Independent Producers
As a video production professional for more than 2 decades, I’ve seen first-hand how live streaming has evolved from a niche service to a widely accessible and essential communication tool. The next frontier? Seamlessly delivering content in multiple languages. While this might sound daunting, the “democratization of video” has made it surprisingly achievable for independent producers, allowing us to meet complex client demands without a broadcast-sized budget or team.
In this article, I’ll walk you through a practical workflow for producing multilingual live streams with real-time translation—a solution that’s often simpler than managing a complex hybrid event.
For me, the democratization of video empowers independent producers to tackle technically challenging customer requirements— tasks once exclusive to broadcast networks or large A/V companies. Live streaming itself is a mature field, but recent years have brought new complexities. Hybrid productions, for example, present a unique set of challenges: managing mix-minus (a separate audio feed for each participant, excluding their own voice), potential feedback between in-person and online audiences, and integrating remote presenters, especially in panel discussions. While demanding, as video producers, we’ve largely developed robust workflows to address these needs, adapting to client specifics and available equipment.
Multi-Language Production
This article introduces a workflow that, while seemingly complex, is only incrementally more difficult than what many of us presently produce. Producing a live stream in multiple languages with real-time translation was always a workflow I assumed required external experts and was technically beyond my solo capabilities. However, this workflow has been democratized to the point where I found it relatively easy to implement when a client approached me about fulfilling a translation requirement for national productions, as opposed to the regional, single-language content we typically produced for them.
I was fortunate that my existing live-streaming service, IBM Enterprise Video Streaming, supports multiple-language audio. In researching this article, I initially assumed multilanguage audio support was a standard feature, but I was surprised to learn this isn’t always the case. Even within IBM’s own offerings, only its Enterprise plans support this feature. Always verify multi-language audio support with your chosen streaming provider, as features vary significantly across platforms and plans.
The Evolution of Translation Workflows
While I’ve been involved in larger live streams with real-time translations for many years, my role was historically limited to video cameras and the live stream itself. Third-party providers typically managed the translation and associated audio. On our delivery end, we would simply output two separate live streams with different audio tracks, requiring the viewer to choose in advance which link to follow based on their preferred language.
Typically, one stream was in English with the natural, untranslated audio (as most content was delivered in English), and the second was in French with almost entirely translated audio. In these workflows, a translator sat in a dedicated, often expensive, soundproof booth at the same location as the presenter. Inside the booth, they had a video monitor displaying a live camera feed and wore headphones with a direct audio feed. Seeing a close-up video feed of the presenter is preferred over just hearing audio. The translator spoke into a microphone connected to a dedicated soundboard for a clean feed to send their translated audio back to the live stream. These single-language translations often satisfied requirements for Canadian government broadcasts to be produced in both of Canada’s official languages, English and French.
Two separate live streams satisfied the minimum accessibility requirement but fell short compared to the viewer experience on online streaming platforms that allow on-the-fly language selection. Viewers ultimately appreciate the freedom to choose their preferred audio.
These early translation workflows were also predominantly English to French and didn’t translate French back to English, even if the speaker switched languages, which is common with federal politicians in Canada as a sign of respect to the French-speaking population.
My Current Multi-Language Workflow
The workflow I now employ allows me to produce a single live stream with up to five different language versions, all utilizing a single high-bitrate video feed. We typically use three audio language options:
- U.S. English: This is our default channel setting, as the majority of viewers will consume content this way. This selection also provides additional closed-captioning features, which I will discuss later. The channel includes content presented in English, plus content presented in another language translated into English.
- Multilingual (Original Audio): This option delivers content as the speakers present it, untranslated. Most bilingual viewers prefer to watch multilingual content this way, and it is an important option that is often overlooked. It is also important to record this clean audio track in case the client wants to re-record the translation for on-demand viewing.
- Canadian French: This channel provides content presented in French, plus content presented in another language translated into French.
This single video workflow with multiple audio tracks means less video bandwidth to upload, which is still a significant consideration at many venues. It also allows viewers to change the language to their choice of English, French, or multilingual directly within the live-stream video player.
When it comes to multilingual content, viewers appreciate options, and I’m no different. I spoke English at home but attended French immersion school from preschool through high school. I prefer to listen to English presentations with small amounts of untranslated French, as I generally don’t need a translation for French passages spoken by native English speakers in their second language. However, my preferences change when a native French speaker presents; in these situations, I benefit from a translation, as the pace and language level are higher than I can easily decode. A native French speaker may similarly want an English translation version.
Remote Translation Solution
Instead of being in the same room in a special and expensive sound isolation booth, our freelance translators work from their home studios. They connect using Zoom or Microsoft Teams and wear a headset. We dedicate a laptop with a Magewell SDI-to-USB video capture card for sending video and audio to the translator. Their audio output can be routed via NDI on Microsoft Teams to vMix, our livestream software of choice, or we can run a
3.5mm to 1/4" audio cable directly to a soundboard dedicated to that language translation. A remote translator offers a much more cost-effective solution than hiring a translator to come on location and providing a sound booth for them. Space is often at a premium, and adding two booths in a conference room isn’t usually preferred by event organizers. Another benefit of hiring remote translators is that the majority of the talent pool for French and English translators in Canada lives in the French-speaking province of Quebec or in Ontario, where our nation’s capital is located. These are both eastern provinces, while I am located on the west coast in British Columbia, making remote collaboration highly practical.
Translators prefer to see the speaker they are translating, so we send them both video and audio. We don’t need to see the translator, so after introductions and testing, they can turn off their webcam, although we often ask them to keep it on to quickly verify connection stability before they begin translating.
Regardless of the signal flow (into vMix first and out an Aux bus or directly to a dedicated soundboard), the translated audio is fed into its own dedicated soundboard. We use the Allen & Heath CQ series of digital soundboards for their small footprint and advanced digital features, like auto microphone mix (AMM).

The Allen & Heath CQ line Auto Mic Mixer being used to duck the original multilingual audio under the English translation automatically whenever the translator translates another language to English
I also use the AMM feature for panel discussions. It automatically mixes multiple open microphones based on operator-set priorities. If the soundboard doesn’t detect a direct microphone feed in an open channel, it attenuates that channel, resulting in a much lower noise floor. It uses noise gates and attack settings to reactivate microphones when a signal is detected, doing so quicker and more naturally than a human soundboard operator typically can, especially if that operator is multitasking, which mine often are required to do.
The man behind this technology is Dan Dugan. Now that the patent has expired, I have used the technology on both Allen & Heath and Behringer digital soundboards with great success.
Normally, I weight microphone inputs evenly for panel discussions, as every panelist and the moderator are equally important. For translations, the translator’s input is weighted higher than the untranslated audio, which “ducks” the untranslated audio under the translator’s audio. In practice, this means that on the English-translated channel, the viewer hears English untranslated, and when French is spoken, they hear a few seconds of French before the translator’s voice comes in.
Our preferred method involves ducking the untranslated audio under the translator’s voice. That means the original audio is attenuated but not fully muted. This approach offers a more transparent broadcast, allowing viewers to still perceive elements of the presenter’s original delivery, such as emotion and inflection, which are often lost in translation.
Let me just take a moment to stop and admire the skill that real-time translators have honed. I am always amazed at how translators perform their magic, not only translating words but, more importantly, the meaning of full phrases and sentences into the second language, all while simultaneously listening, internally processing, and speaking the translation.
In workflows requiring two-language translations, a second remote translator is employed to translate the untranslated audio into the second language. A second soundboard is dedicated to this second translation, as the AMM feature on my soundboards can only mix one output at a time.
Live-Stream Software: Configuring vMix for Multi-Language Output
I use vMix to manage the three audio channels (English, Multilingual, and French) and multiple video inputs (typically multiple PTZ cameras directly in vMix over NDI or the output of an SDI hardware video switcher).
It’s critical that vMix recognizes your three audio sources as distinct channels, each routed to a specific output bus, rather than being mixed down to the main output. Here’s our typical routing:
- Main Output: English audio (primary translated channel)
- Bus A: Untranslated/Original audio
- Bus B: French audio (secondary translated channel)
This is how the vMix encoder is set up when I broadcast multiple audio tracks using IBM Enterprise Video Streaming. The primary stream is for English audio, and the program video is streamed at a high resolution and bitrate. I typically use 6Mbps for a 1920x1080 live stream, but my plan supports 4K live streaming as well. The audio channel selected is the main output. These are standard settings for all of our broadcasts, but from here on in, there are small but important changes and selections that need to be made.

Selecting audio channels on IBM Enterprise Streaming Video
For the RTMP address, copy and paste the IBM Channel RTMP URL from the channel’s Stream Settings. For the channel key, copy and paste the IBM Channel channel key from the channel’s Stream Settings and add a unique number at the end. For example, if the channel key is 123ABC, then your channel key could be 123ABC1.

Sending three different language audio tracks to three different audio busses
For the second stream, we are sending untranslated audio from the A Bus. The RTMP is the same as before, but for the channel key, this format is used:
[Channel Key][Unique Number]|language= [Language Parameter]
The channel key is the same as in the primary stream, but the unique number is 2. For the language, the mul code is used for multilingual. This is the resulting channel key:
123ABC2|language=mul
For the third stream, French audio from Bus B is broadcast to the same RTMP address with a different unique channel identifier and its own language code. For Canadian French as the third language option, this is how the channel key would be coded:
123ABC3|language=fr-CA
Click here for a full list of language codes and IBM’s instructions.
While selecting the correct audio channel for Streams 2 and 3 is critical, the video bitrate you send for these secondary streams is largely irrelevant. My testing confirms that IBM’s platform prioritizes the video signal from your primary stream when multiple streams are sent to the same channel with unique identifiers. I specifically tested sending black video (via a second vMix video output) and even a low-resolution (240p at 300Kbps) version of the program video on Streams 2 and 3.
In both scenarios, when toggling audio languages, the viewer consistently saw the highquality 1080p video from Stream 1. This means you can significantly reduce your outbound bandwidth by sending minimal video data with your secondary and beyond audio streams.

Stream settings for the third language stream with French audio on Bus B with a negligible bitrate video signal
If your streaming software or hardware supports audio-only RTMP streaming, IBM says that you can send audio only on Streams 2 and 3 and skip sending a low-res video signal. However, my review of OBS and Wirecast revealed that they too do not support audio-only streams and outlined similar video workflows on Streams 2 and 3 to achieve the same thing I previously discussed using vMix. By sending a negligible bitrate for Streams 2 and 3 video, I need only slightly more than one-third the bandwidth compared to streaming to three different channels concurrently.
When selecting language codes, it is important to set and note the channel default language, and the additional streams must be set to a different language code. As I mentioned, I set my default channel audio to en-US (U.S. English), despite there being an en-CA (Canadian English) code that the Canadian in me would prefer to use. The reason I use en-US is because IBM Watson Captioning supports closed-captioning with this language code, while it doesn’t in my native en-CA code.
Putting English on the main channel (Stream 1) means that viewers can toggle closed-captioning on if they want to read along. The closed captions only caption the main stream audio, so regardless of the audio selections, the captions are in English or whatever the default channel audio selection is set to.
IBM Watson Captioning supports both French and French (Canada) language codes. The only workaround to getting both French and English captions is to send streams to two different channels and enable captioning for both.
Viewer Experience and Business Impact
For viewers, having the option to change the language without compromising video quality or having to follow an alternate video link are important factors. Real-time captioning in the primary language is a bonus in terms of increasing accessibility.
As a video producer, being able to produce content in multiple languages with a modestly different workflow than I would typically use helps me deliver more specialized services to my clients. This avoids the problem of their technical needs outpacing my ability to deliver on them without outsourcing parts of the job or losing it entirely to a much larger company. This expanded capability has allowed me to open new client segments and enhance the value proposition for existing ones, reaching wider global audiences and improving engagement.
Related Articles
AI-driven dubbing has recently gained attention as major platforms like Amazon Prime Video and YouTube roll out new tools designed to expand their content's global reach. Amazon is testing AI-assisted dubbing on licensed content, while YouTube has introduced auto-dubbing for thousands of channels. Both efforts reflect a growing belief that dubbing can help platforms engage new audiences—but the results so far have been mixed.
19 Mar 2025
One of the first areas where Generative AI and AI/ML have made a visible, demonstrable, "on-air" impact on widely distributed streaming content is what is popularly known as "subbing and dubbing," or program localization through subtitles and dubbing into other languages. Sinclair VP Innovation Lab Rafi Mamalian explains how Sinclair been localizing Tennis Channel content since they began distributing it internationally.
14 Mar 2025