Streaming Media

Streaming Media on Facebook Streaming Media on Twitter Streaming Media on LinkedIn

NAB 2017: Watson Brings New Capabilities to the Ustream/IBM Cloud Video Platform

In this interview from NAB 2017, Shawn Lam and Alden Fertig discuss how IBM's acquisition of Ustream a year ago has brought new capabilities to the streaming and cloud video platform now known as IBM Cloud Video.

The acquisition of Ustream by IBM has brought a host of new features to the streaming and cloud video platform, particularly the leveraging of IBM's Watson technology to extract information from the video and create keyword-searchable transcripts. Read the complete transcript of the interview below.

Shawn Lam: It's Shawn Lam here for Streaming Media Producer at NAB 2017. I'm here at the IBM booth. We're talking with Alden from IBM Cloud Video.

Alden Fertig: Hi Shawn.

Shawn Lam: You will recognize Alden from Ustream, the webcasting company that was acquired by IBM. Today you've got some neat offerings under the hood here with the acquisition by IBM. What can you tell us about that?

Alden Fertig: So, we've done a couple things. It's been just about one year since Ustream was acquired by IBM, and we've done a bunch of things in that year. One of the things that we've done is we've taken big advantage of IBM's Cloud, and we've expanded our presence through IBM Cloud all across the globe. You can ingest and you can deliver video from any of IBM Cloud's locations all around the world.

Shawn Lam: Now there's a lot of neat computing power that IBM has, such as Watson. How have you guys leveraged that?

Alden Fertig: That's the thing we're announcing and really focusing on a lot this year at the show: leveraging what we call our cognitive capabilities. We're using Watson, which everyone seems to know from IBM, in many ways. We're using it to extract information from videos. One of the things about video is we kind of think about videos being dark data. You'll hear these quotes that video is the largest source of traffic on the internet. It's the biggest data source, but it's actually unstructured, most of that data. It's lots of bits and bytes, but you don't really know what's happening inside that video.

We're using Watson in a couple different ways. To extract visual information, and extract audio information. These are two things that video is made out of, and we're doing things like speech-to-text. So we're taking the audio track and turning that into a transcript that then we can use for captions, or we can use to publish a transcript of a video. We're using it to extract things like sentiment analysis so we can decide was a scene happy or sad, or what were the emotions going on. We're extracting keywords through visual recognition. So we're saying what's in this image. What sort of objects do we see inside the video. We're able to use all this to enrich people's media workflows and provide business value for them.

Shawn Lam: The possibilities are endless in terms of what you can do with that data once you're able to actually analyze it, and to put it into something that you can use and manipulate. What are some of the possibilities?

Alden Fertig: We're taking here a clip from a Food Network show, and here's the thing that Watson is able to identify out of this clip. We have audio keywords, so they're picking up key concepts. There are entities like people and locations. There are concepts, like it's a cooking competition. There's our emotion analysis. Then we have our video transcript that we're actually getting a word-for-word audio speech to text transcript. Let's say later we want to make a highlight reel of everything featuring bananas, there we go. We can pull those keywords that were automatically generated. That's a perfect example of something that Watson might identify, and it could be very useful to an editor later, but you know probably someone manually wouldn't think to put in a tag of bananas on their clips. So, that's an example of our content enrichment service.

Here's another example of the use case, where you can use this cognitive enrichment. We took a library of TED talks and we applied Watson capabilities to those TED talks. The way the demo works now is you can ask a question. So we'll ask what's the secret to happiness. Then, it will find relevant TED talks where various speakers talked about you know this particular concept. So you have Dan Gilbert talking about the secret to happiness. You have Nancy Etcoff talking about happiness and its surprises. So, it's gathering this enriched data about the video automatically. So it's picking up that they're talking about happy concepts in the speech. They're maybe putting words on the screen; they're talking about happiness. So now we're able to do a smart search based on that concept.

Shawn Lam: Based on the actual content and not the title because this one says Half a Million Secrets. Otherwise, if you just searched on that you might not come across the true secret to happiness.

Alden Fertig: This is another example of how we can use Watson to enrich a piece of video content. So in this example we took a keynote speech at this conference, and we were able to use a Watson API to get the key concepts and make automatic chapters from that. So, an example you see here is the introduction. This is where Ginni Rometty, who's the CEO of IBM, starts talking. Let's try to go to something that we think is interesting. Block chain has been a hot topic for us, so we'll just go ahead and go to block chain. It indexes to that exact spot in the video.

Ginni Rometty: ... And block chain has a promise of such value, but not unless there's standards out ...

Alden Fertig: Basically, right there she just talked about block chain, and we're able to sort of extract these keywords. So this is sort of an interesting way of, we're not getting a word by word transcription and showing as a caption, but it's automatically identifying what some of the hot topics are. This is something that we can use to basically extract key concepts, and then make what we're calling this contextual seek. That we can seek through a video in a more contextual way rather than just sort of, a perfect example here, if we tried to just seek through the visual information, you just see three people sitting in chairs talking the whole time. So, we're able to layer on this text information that allows you to seek in a much more efficient way.

Shawn Lam: Thank you very much Alden.

Alden Fertig: Thank you Shawn.

Related Articles
AJA Product Marketing Manager Bryce Button gives Shawn Lam a first look at the AJA Ki Pro Ultra Plus 4-Channel Apple ProRes Recorder.
NewTek's Will Waters gives Streaming Media Producer's Shawn Lam a close-up look at NewTek's new NDI-equipped, 4K/60p-capable, dual-streaming encoder TriCaster TC1.
Vimeo's Anjali Sud gives Streaming Media Producer's Shawn Lam a look at Vimeo's new review and collaboration features and integration with Adobe Premiere Pro.
Shawn Lam and Epiphan's Anthony Taroni discuss Epiphan's 4k and Webcaster X2 in the Epiphan booth at NAB 2017.