How to Caption Live Online Video

The captions provider, via transcription or respeaking, converts the speech to text, which is sent back to the caption encoder. The caption encoder optionally delays the video to account for the time the conversion takes, then embeds the data into the SDI. This output is then fed into a real-time broadcast encoder (such as Elemental Live), where it is converted into your broadcast format (HLS, RTMP, etc.).

Signal flow for live captioning tends to be a fairly expensive process. Replacing this equipment with a software-only solution, such as EEG Falcon, will likely prove to be more scalable and affordable—assuming it works within your workflow and is integrated into your platform of choice.

How to Get the Best Caption Quality

There are several steps you can take to get the best possible quality when delivering live captions. Here’s what we recommend:

Define your use case. Creating an ideal user experience starts with understanding the use case really well.
Prepare your content first. Send all the content details you have available to the captioning provider in advance. This may include show description, speaker names, run of show, event link, special terms and acronyms, scripts, and any other helpful context you can provide. Captioners use shortcuts and macros to increase their speed and accuracy. Having this information ahead of time allows them to prepare those and means a significant boost in the quality of your captions.
Does your show include video playback? If you’ll be playing back video during your broadcast, check to see if that content is already captioned. If so, you may be able to pass those existing captions along rather than create live ones, provided, of course, they are better quality than you think you can get from a live captioner. Make sure to let the providers know to expect any videos so they can refrain from captioning those portions. Another option is to send the caption file or transcript for the video to the captioning provider in advance.
Consider appearance options. Depending on your workflow, you may have a variety of options for screen placement, color, and style of your captions. Experiment to see what works best for your content. For example, captions along the bottom could cover lower-third graphics so an upper placement might be better. While scrolling captions tend to work better for live, you may have a good reason to go with pop-on, so be sure to test that too.
Test, test, test. Make sure to give yourself plenty of time to test and iterate the captioning solution you want to implement. For example, a test could reveal a mismatch between the caption standard and the broadcast format you are using. Since the technology involved is constantly evolving, you want to create a process that includes frequent testing from end to end to avoid any failures when it is time to go live.

Final Words of Advice

Captioning live video online is challenging, and in some cases it is expensive. The payoffs are compelling: more viewers, higher engagement, and increased impact. There are no plug-and-play solutions readily available, so be prepared to do your homework.

If you hit roadblocks, seek advice from people like us who are already doing it. Form alliances with your key vendors to help you get it right. Accept that it may take awhile to get there. Finally, captions with room for improvement are better than no captions at all. So just get started with a commitment to improving and enhancing your user experience along the way.

Sidebar: Captioning Standards: Where We Are and How We Got Here

The Television Decoder Circuitry Act of 1990 requires televisions with screens 13" or larger to have support for closed captions. The passage of this bill into law solidified the dominance of the EIA-608 standard already in use by the NTSC. Later, when CEA-708 was developed, it included an EIA-608 backward compatibility mode to carry a 608 payload.

Thus, it came as no surprise that when Apple released HLS in 2009, it chose to support this format. This solution worked brilliantly, as it was a workflow that broadcasters were already accustomed to. In addition, it leveraged much of the existing, and often expensive, caption encoder hardware and services.

The 608/708 format does have some drawbacks, however. First, it is highly U.S.-centric, with extensions for limited support of European character sets. Support for Asian and Arabic encodings are simply nonexistent. It is also a very complicated standard and has very few free and open source tools available. This made it difficult for organizations that did not produce live video or whose primary consumers were on the internet to work with the standard.

In 2012, Apple added help for segmented Web Video Text Tracks (WebVTT) support in iOS and Safari. Even though the Apple platforms still do not support some advanced WebVTT features such as styling and CSS, the changes to iOS and Safari made dealing with captions, especially for VOD content, much easier. No custom tools were needed—only a basic text editor to make the VTT files and update the M3U8 file manifests was needed. It also added support for the full UTF-8 character set.

Some features were lost, however, such as paint-on and pop-up modes. This makes live corrections impossible, which is a serious limitation since stenographers and voice writers often make them. Producers also need to know in advance exactly how long to leave the text on the screen before replacing it. In a live scenario, that’s impossible to do without delaying the captions.

There are many other standards for closed captioning, including TTML, SMPTE-TT, SRT, and SSA. Each of them has pros and cons. As history demonstrates time after time, in any format war the winner will be not the most advanced technology, but the technology that the greatest number of consumers can access. It ultimately comes down to the question, “What can your users decode and play back?”

The focus on HLS here is not necessarily because it’s the best, but because it has the most rigid requirements. If you need captions on iOS, these are the only options that are at least semi-standards-compliant (i.e., the only ones that avoid vendor lock-in). Other platforms are more adaptable. On Android, ExoPlayer supports both Apple’s version of live WebVTT as well as 608. And on the web, HLS.js (from Dailymotion) supports 608 all for free. Many commercial players such as JW Player and THEOplayer offer similar capabilities.

If iOS is not a concern, and/or you have a DASH-based platform, things are trickier. ExoPlayer can support in-band WebVTT and TTML captions in fmp4. For the web, you must either have a player parse the fmp4 to send cues to the text-track HTML5 element, or support something custom similar to Apple’s version of WebVTT. Check with your player vendor. (This assumes you can find a live encoder that can produce this format.)

The 608/708 format is most certainly past its prime. It’s difficult to work with, limited in capabilities, and, again, U.S.-centric. Unfortunately, there is not a clear path to a replacement. There are several major reasons for this. First, most new proposed formats are developed with VOD in mind. Taking VOD-centric standards and attempting to adapt them to live is generally unworkable. Just as the internet application industry shifted to a mobile-first mentality, those of us in internet video must shift to a live-first mentality. Second, new standards are not focused on the production side. If one of these standards could deliver a full photon-to-photon solution, it would have little competition.

Special thanks for contributions from Naomi Black, Alex Barrett, Jamie Baughman, Dan Swiney, and Heather Duthie.

This article appears in the March 2017 issue of Streaming Media magazine.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Complete Guide to Closed Captions

Closed captioning is legally required for some video, but it's the right thing to do for all video. Beyond the ethical and accessibility considerations, it engages more viewers and makes smart business sense. Here's all you need to know about captioning today.

19 Apr 2022

How to Effectively Deploy Auto Captioning Solutions for Streaming VOD

Automated speech recognition systems solve critical problems in the VOD streaming industry today, enabling service providers to improve the accuracy of captions created leveraging speech-to-text processing. However, ASR systems are not without limitations. By taking a hybrid approach that combines auto captioning with quick manual inspection before delivery, OTT service providers can improve accuracy and introduce significantly higher efficiencies into their VOD streaming workflow.

01 Dec 2021

How to Caption Live Online Video

How to Get the Best Caption Quality

Final Words of Advice

Sidebar: Captioning Standards: Where We Are and How We Got Here

The Complete Guide to Closed Captions

How to Effectively Deploy Auto Captioning Solutions for Streaming VOD

Video: How to Get Started with Live Captioning

Video: What to Look for in a Captioning Vendor

Video: How Reliable Is ASR-Generated Live Captioning?

Writing Text for Video: Did Someone Say 'Autumn Aided Cap Shins'?

New FCC Caption Requirements: What You Need to Know

Facebook Live Videos Now Support Closed Captions for Publishers

Netflix Debuts Access Improvements for the Visually Impaired

Best Practices: Fine Tuning the Live Stream

Best Practices: Analyzing Your Video Analytics

More

Sports Streaming Tech Breakthroughs

IBC Streaming Solutions

More Web Events

Q&A: Viant CMO Jon Schulz Discusses Their New Strategic Partnership With LG Ad Solutions

YouTube Self-Regulation Isn't Working

How the Gen Z Playbook Is Reshaping Next-Gen Sports Streaming

Q&A: Globo Director of Product & Engineering Igor Maciel Macaubas Talks Content Delivery, Discovery, and Build vs. Buy

Gravity shift

More