September 18, 2014
By Jan Ozer Contributing Editor
Featured Articles

How to Deploy Closed Captions

For example, if you were captioning a video, you would caption when background music played, when a shot rang out, when a car started in the background, or when the actors laugh or cry. If you were subtitling the video, you would assume that all these could be heard by the viewer. Captions are also assumed to be in the same language as the video, so if an English-speaking video displayed a sign in English -- a Stop sign, for instance -- you wouldn’t caption that, but you would if you were subtitling the video into French.

Basically, captions are a transcription of the spoken words and all important sounds, while subtitles translate the speech and text in the video. Beyond this distinction there are a number of general captioning conventions that are critical to know if you decide to caption yourself. The following guidelines are from the Accessible Digital Media Guidelines produced by the National Center for Accessible Media.

Captions should be a verbatim representation of what is being said, although you may edit out unnecessary speech (um..., ah..., er..., etc.).
Each caption should be composed of one or two rows and should be positioned in the bottom center of the caption region. Avoid using three-row captions except when using a speaker identification.
If you can fit a caption on a single row rather than two, do so.
Caption what is spoken: if the speaker says “string” when he meant to say “spring,” for example, caption it as such.
Don’t correct grammar -- if you hear it, caption it.
End punctuation (period, exclamation point, question mark) indicates the end of a caption, and the next sentence starts with a new caption.
Points (...) may be used to indicate a pause, an interruption or a suspension within the caption, or as end punctuation.
When using two-row captions, avoid formatting them so that one line is substantially longer than the other.

Failure to observe conventions, and/or otherwise shoddy captioning, can land your video on Tumblr (see Awkward Netflix Captions) or can even get you sued. In this regard, note that if you’re captioning broadcast video to meet FCC requirements, you have a lot less leeway, because the captions must maintain the same look and feel as TV captions. Keep this in mind as we discuss the available tools, because shareware tools may not meet these requirements.

Creating Live Captions

The complexity involved in creating closed captions varies greatly depending upon whether you’re producing live or video on demand (VOD). Let’s consider live first. For live productions, unless you’re very, very experienced, you should strongly consider hiring a professional to produce the captions for you. There are multiple reasons why.

First, there are two general techniques used for live captioning: One involves stenography machines, the other voice recognition, where a trained professional repeats what he or she hears into a microphone for the transcription. Both require years of training to achieve competency, so you definitely don’t want to try this yourself.

Second, the captions created via either technique must be deftly integrated into the life captioning workflow and adequately tested before the event. If this is the first time you’ve produced live captions, you’ll want to involve someone who’s been there before. Finally, live captioning only costs about $100-$150 an hour, plus extra for ancillary consulting services. So it’s not going to break the bank.

Creating Captions for VOD Video

Obviously, the non-real time nature of VOD captioning makes it a better candidate for trying it yourself, and most computer-literate professionals can probably do a competent job with VOD using one or of the tools described below. VOD captioning is also more expensive than live because it takes longer, so captioning an hour of video can cost from $150 to $500, depending upon the service provider and turnaround time. Any of the service providers mentioned above can create VOD captions for you.

If you have to create your own captions, either from scratch or from a transcription, there are a number of tools, free or otherwise, for performing this function. Obviously, you can create the captions in MacCaption/CaptionMaker, which has a highly evolved interface for doing so, and very extensive export capabilities, for both embedded captions and sidecar files (Figure 3).

Figure 3. Creating Captions in MacCaption

Another great place to start is YouTube (Figure 4), which uses a speech-to-text converter to create a rough transcription, and then synchronizes it to the video. You’ll definitely have to clean up the captions, which you can do in the Sub-titles and CC tab of YouTube’s video manager, and then export them as .vtt, .srt or .sbv files.

Figure 4. Creating your closed captions in YouTube

If you’re creating TTML or WebVTT files for HTML5 captions, Microsoft has a surprisingly useful free tool called the HTML Video Caption Maker. Input the URL of the file you’re captioning, and you can play short chunks of the file and type in captions, and then copy and paste them into either a TTML or WebVTT file. The tool even tells you what to name the file and how to integrate it into your HTML5 code description.

Finally, I should mention that Dotsub has a similar free tool for creating captions, plus the ability to translate the captions to different languages. You can pay Dotsub to perform the translation, or the website hosts a multilingual community of captioners who may do it for free if they find your video sufficiently interesting. Whatever tool you use to create the captions, just be sure that it can output the format or formats that you require for the various distribution formats you’ll be producing.

Supporting Multiple Output Formats

Most smaller websites can use HTML5 with Flash fallback to distribute MP4 streams to desktop and mobile. You’ll need to produce a TTML or WebVTT files for HTML5, which you’ll integrate with the video file in the player HTML code. With the latest versions of all the main browsers, caption display has become fairly uniform, making this option reasonably viable, though caption support on older mobile devices is spotty. You’ll also need a TTML file for Flash, which you’ll integrate into the Flash SWF file.

If you’ll be supporting multiple adaptive streaming formats, things get more complex. In these instances, you should consider using a streaming server such as the Wowza Streaming Engine, which can input captions from a single embedded or sidecar source, and transmux them to the formats required for Apple HLS, RTMP Dynamic Streaming, and Adobe HDS (but not Microsoft Smooth Streaming) (Table 1).

Table 1. Inputs and outputs supported by the Wowza Streaming Engine

No matter what tools you use to create them, captions are a crucial addition to your video workflow. Beyond accessibility for deaf and hard of hearing viewers, captions improve SEO rankings, enhance video consumption in louder environments, and often improve viewer comprehension. Particularly for longtail VOD-type clips, which can be captioned with minimal expense and effort, captioning can be a very wise investment.

This article appears in the September 2014 issue of Streaming Media as "How to Deploy Closed Captions."

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Companies and Suppliers Mentioned

How to Deploy Closed Captions

5 Low-Cost Ways to Provide Closed Captions for VOD Streams

3Play Media: Captions Aren't Just a Good Idea, They're the Law

How to Deliver Low-Cost Captions on Live Webcasts

Best Practices: Fine Tuning the Live Stream

Best Practices: Analyzing Your Video Analytics

More

Live Streaming in Real Time for the Pros

Sports Streaming Tech Breakthroughs

More Web Events

Warner Bros. Discovery Drops Linear Ballast to Float HBO Max

Netflix Makes Quietly Aggressive Aggregation Play

JustWatch Reveals Streaming Trends for LGBTQ+ Content in the UK During Pride Month

Monetization. Agility, and Tech Readiness Define Streaming Success