Microsoft Debuts AI Cloud Service for Video at Build Conference
Microsoft is making artificial intelligence (AI) available for free to streaming video developers. Now it wants to see what they'll do with it.
At its Build conference in Seattle, Washington, Microsoft today announced Video Indexer, a cloud service now part of it Cognitive Services lineup. To give a little background, the company's Artificial Intelligence and Research Group was formed in September 2016 as a way to democratize AI, making it available to all developers. It creates tools or services that can be integrated in other code via APIs or SDKs to ad AI functionality.
The group's Cognitive Services toolkit debuted with 14 machine-based learning services a year-and-a-half ago. That grew to 29 services last year. Today, Microsoft introduces 4 new services, one of which speeds video and audio metadata creation.
Video Indexer is available as a preview download for free testing. Microsoft wants developers to give it a try so it can learn from their experiences and refine the service.
With Video Indexer, developers can harvest a variety of useful metadata from files with no human interaction needed. The service can identify faces, transcribe spoken audio, detect objects within a video, and detect emotions. With that information, publishers can improve discoverability or improve monetization by serving targeted ads that better match a video's contents.
During the preview phase, Video Indexer is free but developers are limited to uploading 10 hours of video per day and 40 hours total. They can up load a maximum of 20 files, which each one no more than 4 GB.
Video Indexer is fast, processing a 45 minute video in about 5 minutes. It achieves that by breaking videos into sections and using AI to pull data out of each one. It can identify which speaker is talking at any time, and index on-screen text. It can translate text (it currently supports 9 languages) and monitor for explicit audio or visual content. It's also able to detect scene changes and extract key frames. The service is only for saved, not live, video.
As this is still a work in progress, some tasks have higher success ratings than others. Face detection is highly reliable, while emotion detection has roughly a 60 percent success rate. The process is designed to be fully automated, but even if companies add spot checking by a human, getting the results will take far less time than if all the work was done by hand.
Roughly 8,000 people work in Microsoft's AI and research group, with 5,000 of those working strictly on AI. Around 150 of that group works on Cognitive Services. This group of engineers and researchers turns AI research into products. For video, they've gotten to a point where they can share their work with a larger group.
Expect the preview period to last between six months and one year, says Irving Kwong, group program manager for Artificial Intelligences and Research Marketing at Microsoft. The company will work closely with customers to monitor performance. Microsoft published a blog post with more information and a link to assets.
Artificial intelligence and machine learning, along with deep learning and neural networks, are solving OTT challenges from encoding quality to closed captioning.
AI, machine learning, and neural networks are no longer the sci-fi dreams of tomorrow. They're here today, and IBM Watson is leading the way. Learn more about how it works in our webinar Thursday, December 14.AI
The industry has barely scratched the surface on how artificial intelligence can be used. In the next few years, look for AI to automate mundane areas of live streaming.
With help from Microsoft and Ooyala, ZoneTV is creating a system that streams online videos to pay TV users and learns from their preferences.
Collecting video files and related metadata for broadcast and multi-screen delivery has become complicated and costly, thePlatform says.
Ban the data silos and group various types of metadata together. Digitalsmiths offers a future-forward solution.
The online video industry is on the verge of offering useful new features powered by metadata. But when will the promise become a reality?