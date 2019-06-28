Video: Audio Analysis and Machine Learning for Video

Learn more about machine learning and AI at Streaming Media's next event.

Watch Jun Heider's complete presentation, VES103. Enhancing Media with Machine Learning in 2019, in the Streaming Media Conference Video Portal.

Read the complete transcript of this clip:

Jun Heider: In my opinion, this is one of the more mature areas of machine learning for video. These services are pretty accurate when it comes to speech-to-text, from what I've seen. Obviously they're not perfect. But I feel that they're more perfect in this regard than they are with object detection, especially when we're talking about videos in the wild.

Say I'm a machine learning service, for instance, maybe Video Indexer or Valossa. And I'm tuning my models and they're going to probably cover 80%. But there are going to be those videos that they're not expecting, and they haven't been properly tuned to. So what I would say now is speech-to-text and translation or speech-to-text iare actually pretty good so far.

Translation builds on top of that. In addition to being able to get a transcript, you can then take that transcript and translate it to other languages. Here's another really cool one. There are sounds in this world that aren't just speech. So we have, you know, bird sounds. We have applause, we have music. Things like that. Certain services can actually tell you what other audio is happening within the video as well. So that's really useful.

What you see on the left is Valossa and what their JSON looks like. I ran it through video, and it detected applause. In the Fauna category, it detected a pet sound, probably a dog bark or something.

Then there's Video Indexer from Azure Media Services. They have this really cool speaker statistics that they give you. Let's say you have some kind of like training system where you're teaching toastmasters, and you want them to be able to have a two-way conversation. You can leverage these statistics to know who's the one that's talking and not letting anyone else talk.

In this case, it's me, because I'm sitting up here talking to all of you and you're all quiet. But speakers' statistics are pretty interesting.

There's also sentiment analysis as well. Sentiment analysis is where you analyze how much happiness or sadness is in a video. Or how positive or negative is a particular point in the video. Valossa has a pretty cool visualizer within their UI where you can get some sentiment.

AMS Video Indexer does positive and negative, and I think they just recently started putting some emotions in there as well.

Watson Video Enrichment has been doing sentiment analysis for a while. So what you can see here in the bottom right is joy, sadness, anger, fear, and disgust.

Please enable JavaScript to view the comments powered by Disqus.

Related Articles