The State of Machine Learning and AI 2019
There’s no shortage of vendors that can help in corporate machine learning and artificial intelligence (AI) technologies into various points in the video delivery ecosystem to introduce innovations like faster time to market and cost savings. Machine learning is the ability of computer systems to progressively improve their performance on specific tasks, while artificial intelligence uses data to learn, predict, and alter an outcome based on learning through data processing. The foundation for each is access to a large source of data to use for training, whether it’s an archive of images for image recognition or quality of service (QoS) playback records.
Get instant access to our 2019 Sourcebook. Register for free to download the entire issue right now!
Machine learning and AI can be used to identify images within video, to generate speech-to-text translations, to create subtitles, to look for optimal patterns for content processing, and to perform myriad other functions. Here’s a roundup of notable machine learning and AI projects that have been developed in the past year.
The British Royal Wedding
UI Centric, GrayMeta, and Amazon Web Services (AWS) worked with Sky News over a 10-week period to produce Sky News’ Who’s Who project, a mobile app used by 850,000 people in 200 countries for real-time identification of guests at the royal wedding of Prince Harry and Meghan Markle last year.
“The engine for the project was based on using facial detection analysis from AWS Rekognition Services,” Sky News senior product owner Hugh Westbrook says in an email. “They were able to identify many of the guests in real-time as they arrived. Things they had to take into consideration were camera angles, crowd motion and activity, weather, and lighting.” They even trained for obstruction of faces by umbrellas and created a mock wedding video, using the Sky News team to test the solution.
For image recognition to work, training models are built around the specific subject matter involved, whether that’s sports plays, illegal activity, or in this case the anticipated wedding guests.
“One of the challenges of the project was that only a proportion of the anticipated guests were well-known celebrities and no off-the-shelf machine learning model would have been able to identify [the ones that weren’t well-known],” says Matt Eaton, general manager, EMEA, GrayMeta.
“Once the [training] model was created, it took a matter of minutes to train new faces. However, the human task of sourcing and curating good, high-quality training images in advance of the wedding day took the most time,” says Eaton. This facial recognition was essentially training the system with multiple variations of pictures of each person until the system could recognize the wedding party on its own.
“We deliberately built in a 90-second offset between live capture and streaming on consumer devices to allow the Sky News Editorial team enough time to review and edit the results from the facial recognition services,” says Eaton.
For capture, an AWS Elemental Live small form factor, single-channel video encoder ingested and processed content and sent it to cloud-based AWS Elemental MediaLive. The live curation tool from GrayMeta allowed Sky News editorial researchers to review the match made by the facial recognition service and override it if it was incorrect.
“UI Centric designed and developed the front-end application and video player,” says Eaton. Viewers were able to watch either live or on-demand, quickly identify guests within the video clip, and read more information about each person (see the demo).
Image recognition is likely what people think about when the terms “machine learning” and “AI” come up in connection to video, but the next project discussed here used this technology to improve the user experience.
Customizing Catalog Navigation
Artwork is the main influence for consumers deciding what content to watch, or at least that’s what Netflix found. According to a company blog post, 82% of the focus in choosing something to watch is driven by content thumbnails. Based on this premise, Accedo spent 4 months developing a project with iTV and
AWS to A/B test what images viewers want to see. The overall result was to improve user engagement by generating custom thumbnails.
“Consumers choose video assets based on emotions and on average we know that users spend 1.8 seconds to evaluate a thumbnail,” says Fredrik Andersson, SVP Products, Accedo. “[We also know] there’s so much to choose from, that you basically give up sometimes. By using AI to generate relevant thumbnails you can secure the best chance possible of engaging users instead of just throwing out hundreds of pieces of random artwork and hope that one of them will engage the user.
Accedo used AI to determine what types of thumbnails would resonate with audiences. The company found that images with expressive facial emotion do particularly well, as do images of villains.
“The reason for using AI is that we can generate different artwork/thumbnails for different user segments and thus generate a higher user engagement,” says Andersson. “The concrete use case is that you and your friends might think you’re browsing the same menu for a video service provider but in reality, it’s highly customized in order to appeal to your interests.”
So, what resonates with people? “Images that have expressive facial emotion that conveys the tone of the title do particularly well to make people watch videos. We prefer villains, so using visible, recognizable characters (and especially polarizing ones) results in more engagement. We don’t like groups; images containing more than 3 characters are less engaging,” he says.
In this project, AWS processed metadata to identify relevant images as well as search for emotions or specific people, and then generated multiple images or small clips. AWS then curated content to target different segments. Accedo used focus group testing to confirm the assumptions.
“Any type of service could use this technology, but SVOD/TVOD services with large catalogues have the most to gain,” says Andersson. “We have already verified that there are regional taste differences. I think this is a good reminder of why AI is powerful—in a previous environment it would have been impossible to serve all customers with differentiated thumbnails in a scalable way.”
Moving on from these customer-facing projects, the next several use cases go further back into the workflow.
Media supply chain firm SDVI is using these technologies to optimize content QC and compliance for its customers like Discovery, which needs to localize content for worldwide distribution. Before, it took 2 hours to process a 1-hour show; now it can be done in 10 minutes, says Simon Eldridge, chief product officer.
“Every territory has its own rules about what can and can’t be shown, so we integrated some of the big cloud vendors’ AI services into the supply chain platform that we provide,” says Eldridge. “Discovery is using it for assisting the manual content compliance process to really guide operators as to where they should be looking.”
SDVI’s platform is using services from both AWS and Google Cloud Platform for object detection, transcription, and an adult-content filtering algorithm. “What they get back basically is a lot of time-based metadata that will indicate, at this time there’s someone smoking, or violence or nudity,” says Eldridge. “We then take that time-based metadata and make it available to their operators in Adobe Premiere.” Instead of having to watch the whole piece of content, the editor can just see the flagged content based on the template for each specific territory. “So it’s not replacing humans; it’s guiding where the humans spend their time.”
SDVI is using AI to allow local TV station operators to more easily determine what content is relevant and approved for their geographic areas.
“The public cloud vendor models don’t necessarily require training,” says Eldridge. “Their models are basically ready to go out of the box, and they do get better over time because the more content they process, the more things they could detect. If you take something like Google’s Cloud Video Intelligence API, they actually train against YouTube.”
“[Discovery] changed their content receiving process so that all content from their network of producers is delivered directly to AWS to an S3 bucket,” says Eldridge. Everything gets validated for the right format and there’s an automated accept or reject process. “The next thing that they do is a couple of things in parallel. One of them is that they create a low-resolution proxy—a 2.5 megabit file—and they take the high-risk content and run it through a couple of automated quality control (QC) processes. Then we take the proxy and run it through any combination of AWS Rekognition, Amazon Transcribe, or Google Video Cloud Intelligence.”
Discovery previously had fixed capacity and was turning down business deals because it couldn’t localize content quickly enough. “Being able to know that the system will just scale [now] and that they can predict how long it will take to actually process content—that’s a huge opportunity for generating new revenue,” says Eldridge.
“I think there’s two ways to use machine learning, one is to build it yourself and the other is to find a product that uses machine learning,” says Jon Dahl, founder, Mux. Mux is offering the latter. “We have a system in production that basically uses machine learning to replicate per-title encoding in a way that’s much faster and much more cost effective,” he says.
Network clips that display tune-in information are automatically suppressed by Facebook's AI, says BET, forcing the network to spend more on promotion.
Meet the big four players in the video artificial intelligence space, then learn how they can speed up time-consuming tasks like generating metadata or creating transcriptions.
Look for artificial intelligence and machine learning to improve content delivery, video compression, and viewer personalization, strengthening the entire workflow.
Companies and Suppliers Mentioned