SMW 17: Microsoft's Media & Machine Learning
Machine learning targeted to media has greatly evolved in the last 12-18 months, and Microsoft Cognitive Services tools now provide full-text audio transcription, face detection, video stabilization, video OCR, face redaction, motion detection, facial emotion detection, video summaries, content moderation, and object detection for VOD content. At Streaming Media West on Thursday, Microsoft principal software development engineer for communications and media Andy Beach shared what the company is doing to give content owners, developers, and data scientists access to AI tools to make it easy to index and search hours of video content.
"The tools will extract metadata from video content and curate the information found within the metadata. The intelligence is output to an embedded player, where a set of widgets provide interactive functionality for viewers," said Beach. "We created a series of APIs that tie into machine learning and productized the offering to make it easy for anyone to get started."
These "products" will help improve content discoverability, enhance user engagement, and hopefully increase content value. Online dating company Match.com tried out the AI tools for content moderation, identifying video or images which were too racy for publishing. Nexx.Tv used the AI tools to build a better advertising use case, analyzing its content's full text metadata to deliver targeted ad overlays; i.e., if the content was about cars, it could match related advertisement to the video content on the fly to deliver more personalized ad overlays.
The initial step is training the AI. "The first version is OK, but it's not great because you have to train it," said Beach. For example, to identify all people within a piece of content, the AI needs to learn who each person is. However once this has been done, there's the ability to use what Beach calls a people heat map. All instances of a specific person can be identified in a video clip, and this is then graphically represented within the video scroll bar. In the graphic below, Julia White appears in 4% of the video, and a viewer can go directly to each clip she appears in. The most common keywords used within her clip is also shown onscreen, and these too are clickable.
Publishers: Plug and Play
Microsoft offers three flavors of AI product. The easiest to use is for content publishers. "Upload content and we will index it, create all the metadata, create a full-text searchable transcript, and provide widgets so you can provide an interactive viewing experience that's custom to your content," said Beach. An analysis of video content can be completed in close to real time, ten minutes of content, should take about ten minutes to process.
Microsoft has an a la carte option to give developers access to some or all of the video AI APIs. These APIs focus on computer visions, content moderation, identifying emotions, face recognition, full-text video indexing, Bing speech, and speaker recognition.
Data scientist: Infrastructure
For those who want to roll their own, the machine learning platform is available for data scientists to train their own neural networks. "You can use our platform as an infrastructure to do the compute or processing," said Beach.
Whatever the preference, users can get 40 hours of free access to try out their tools at https://vi.microsoft.com/