Video: How Reliable Is ASR-Generated Live Captioning?
Learn more about ad insertion for live events at Streaming Media's next event.
Watch the complete video of this presentation from Streaming Media West, LS202: Reaching the Audience--Advances and Challenges in Captioning Live Streams, in the Streaming Media Conference Video Portal.
Read the complete transcript of this clip:
John Capobianco: The reliability of automatic speech recognition (ASR) is the natural question that comes up from everybody. Everybody believes that ASR does good levels of captioning. It doesn't. We've studied this. We've run several ASR engines internally because we're always looking for the best way to get captions done. It doesn't do a good enough job, and I have proof of that because I test what's going on in the country all over the place every day.
ASR Is Cheap
Most of the ASR engine issues happen because people use it because it's cheap. That's really the only reason that anybody cares about: It doesn't cost hardly anything to do it. It's worth every penny you pay for it. It doesn't do a very good job of it. It gets most proper nouns wrong. It gets most proper names wrong. It gets most names wrong. I've averaged this across all of the big providers. I'm not going to name them, I don't care who they are. I look at what they do live on broadcast television. And the average right now is just under 68% accurate.
One out of every three words is wrong. Two-thirds of the errors are wrong words or missing words. Think about that. One out of every three words that you have going on is wrong or missing. It's not an adequate way to communicate.
In addition to that, it captions poorly if there's not a very good connection, there's background noise, people talk over one another. The place where it actually does work is when people are trained to use it and they talk like this to the machine and the machine captions and they put in their period and captions and commas, and they do that through a monotone and they talk to it and there's only a single speaker.
You can train ASR to do okay. We do that. That's what voice-writers do, but at the same time, that's not adequate for most of your broadcast needs.
In addition to that, it doesn't capitalize. It doesn't punctuate. People always say, "Who cares about the punctuation?" If I gave you a pamphlet and it didn't have paragraphs or punctuation or commas or anything else, how far do you think you'd read in that document? You'd be confused very, very quickly. People don't think about it that way. Take a paperback book and imagine it with no punctuation, no chapters, no indexes, no commas, no indents for paragraphs or any of those things. Just a stream of words. It would be awful. You can watch it and it's just not very good.
Why Human Captioners Paraphrase
One of the other really important things about automatic speech recognition is that, and because you say, "Well, I watch captioners and there are missing words when humans do it too." That's true. They paraphrase sometimes. Captioners are trained to do that sometimes, and as much as we don't like to think about not doing verbatim because we all want to do verbatim all the time, verbatim is not always the best delivery on the screen.
Our captioners are taught to paraphrase in order to slow it down enough so the words stay on the screen long enough for somebody to be able to read them. And sometimes they'll leave out some words for better meaning.
ASR engines leave outwards because they get befuddled. Humans leave out words because they're trying to improve the meaning. So when you actually compare what happens between human captioners and automatic speech recognition, the huge difference is the readability of what's happening. And it's the human context of knowing how to communicate effectively with the words that are being spoken.
When Will ASR Be Ready?
We get a lot of questions about this. Everybody wants to know when is it going to be ready? Well, so do we, which is why we test it every day. I've got currently 58 tests that I've just done on 80,000 words and that's where I get my statistics from of 67.88% accurate. It was the 32.12 point, whatever that is, that were inaccurate. And two-thirds of that is missing words and wrong words.
To ensure a high-quality viewing experience—while maintaining compliance with regional regulations—it's imperative that audio and captions are in alignment. This can be achieved efficiently and cost-effectively with an auto-alignment system that utilizes machine learning. The result is a viewing experience that meets the high expectations of today's global audiences and drives growth.
It's about time we practiced what we preach. Captions make videos more accessible in a variety of ways. Here's the workflow we use to caption all the videos on our sites.
VITAC CMO John Capobianco offers a primer in captioning for live streaming in this clip from his Live Streaming Summit presentation at Streaming Media West 2018.
VITAC's John Capobianco discusses the essential offerings of an effective captioning vendor for live video and VOD in this clip from his Live Streaming Summit presentation at Streaming Media West 2018.
The editing capabilities found in YouTube's backend aren't going to compete with nonlinear editors like Adobe Premiere Pro, but there are some powerful and unique tools that make simple editing projects even simpler.
New captioning requirements went into effect on July 1 for live, near-live, and prerecorded broadcast video that is put online.
We're still a few years away from live video captioning standards, and the available solutions are anything but plug-and-play. But that doesn't mean it can't be done. It just takes a little effort.
LinkedIn's Heather Hurford and Streaming Media's Tim Siglin take a deep dive into the current challenges of closed-captioning live-streamed video in this interview from Streaming Media West 2016.
Companies and Suppliers Mentioned