Save your FREE seat for Streaming Media Connect this August. Register Now!
  • October 28, 2020
  • By Ed Hauber Business Development Manager, Digital Nirvana
  • Blog

3 Ways to Accelerate Content Production and Delivery With AI

Article Featured Image

Cloud-based AI and machine learning (ML) solutions continue to transform and accelerate virtually every aspect of content creation and distribution. As these technologies grow in power and sophistication, they’re bringing new levels of accuracy, efficiency, compliance, and cost savings to broadcast operations.

In simplest terms, artificial intelligence solutions use computer technology to discover, extract, or generate metadata—or data that provides information about other data. Machine learning technology automatically learns over time about metadata content and what it represents.

By automating the process of capturing metadata for large volumes of content, these technologies replace burdensome, manual processes and save media organizations countless hours of manual work. Operators can leverage AI and ML tools to make better decisions, faster and improve the efficiency of broadcast workflows in every stage of the content process—from acquisition and contribution to postproduction, distribution, compliance, and verification. (See Figure 1 below.)

Digital Nirvana workflow

In this article, we’ll take a closer look at the primary use cases for AI and ML technologies in the media and entertainment industry, together with some real-world examples of how these technologies are being applied at major media companies.

Use Case #1: Powerful New Efficiencies at Content Acquisition and Contribution

In a typical media operation, producing content for broadcast news or streaming services, the content acquisition stage can be a large bottleneck—often slowed by the sheer volumes of raw content that come in from various field newsgathering sources. It then falls on the production team to manually review the content and select the needed sound bites and segments for a particular story.

Until now, this has been largely a manual process, consuming valuable time of production staff to review content and identify valuable segments together with time references of the tagged footage for easier reference later.  

Shooting ratios of 10:1, 20:1 or greater are commonplace in today’s world of media production.  A one-hour program with a shooting ratio of 10:1, can easily generate 600 minutes of raw content! It’s easy to understand why outdated, manual methods of asset ingest and processing simply aren’t tenable in today’s fast-paced and deadline-driven media operation.

Real-world example: Making content instantly searchable with timecodes and metadata

This media operation is a premier online entertainment site in the U.S., well known for its short-format content centered on Hollywood. Speed and efficiency are tremendously important to this company, which built its reputation on being the go-to source of Hollywood and celebrity news. 

The company recently adopted a sophisticated, cloud-based speech-to-text (STT) process to streamline closed-caption generation and accelerate production of finished content.

While captioning is the company’s primary application for STT, producers saw a compelling opportunity to apply the technology to improve upon the content ingest process. Much of the operation’s incoming raw video is unscripted, long-format interviews with little context or topical insight into the material. Instead of deploying staff to review and log the content manually, the operation now chooses to run it through the same STT process. Almost instantly, this AI technology generates a highly accurate transcript of all spoken words in the video together with metadata and time references. When editing video for a story, producers are able to search the transcript and rapidly find relevant content needed for the finished piece.  

Use Case #2: Shaving Time Off Postproduction

Automated captioning is another powerful and increasingly prevalent application of AI processing, which leverages the power and flexibility of cloud-based STT technology. With new OTT platforms entering the market at dizzying speed, a critical requirement is the ability to generate compliant captions that conform to the unique style guidelines of each delivery platform. Closed captions are subject to specific rules dictated both by the streaming platform and by its target audiences and geographic regions.

By building the closed-captioning process into their existing operations, media outlets can decrease costs and increase the productivity of in-house postproduction teams by orders of magnitude. It is not uncommon to see efficiency improvements of 40% or greater as compared to conventional captioning methods. Because today’s STT technologies have been trained on billions of words and thousands of hours of data representing a wide array of languages, dialects, and accents, STT engines generate transcripts and captions, which are remarkably accurate. Speaker detection and spell-check functions add to accuracy and further reduce the time needed later for manual editing and correction. In addition, translation capabilities allow captions and subtitles to be localized quickly and repurposed for other regions and delivery platforms. 

In a typical workflow, the STT engine creates a time-indexed transcript that can be viewed side by side with the media, along with tools for editing text and adding visual cues, music tags, and speaker tags. As the operator speeds through the text editing process, the corrected transcript is automatically converted into time-synced captions in accordance with parameters defined in the user-defined preset. The corrected and finalized caption file can then be exported to multiple industry standard formats, each in compliance with specific style guidelines for each delivery platform.

Real-world example: Delivering compliant captions across multiple OTT platforms 

The same popular online news source described above needed to deliver high-value content tailored to the specific standards and style guidelines of multiple OTT delivery platforms, including multilanguage transcriptions and accurate, properly formatted closed captions. Moreover, the company needed to keep to a very tight delivery turnaround – no more than 90 minutes from the time of acquisition to the time the content would be published.

Using cloud-based STT technology, this content provider is able to stay within the delivery window and automatically transcribe content with 90% or greater accuracy. By accessing the tools in the transcript edit window, such as an integrated media player with video synced to text, operators are able to edit text quickly and easily. Once complete, the application exports the finalized STT output via API integration to a third-party MAM environment as a sidecar file. The STT file functions as a rich metadata source, informing the OTT platform’s producers and editors where to locate the highest-value content in the shortest time possible.

In addition, the content provider is able to meet different OTT platforms’ requirements for delivery of transcripts and captions in multiple languages. In seconds, the STT engine is able to translate captions into a secondary language and show the translated version in a separate edit window for review and correction. The company is also able to import existing caption files from legacy content for translation and multilingual distribution, preserving the original timecodes for review in sync with the source content for easy editing. 

Use Case #3: Assuring Content Quality, Technical Compliance, and Optimal User Experience at Time of Delivery 

Closely linked to the AI-based transcription/captioning workflow is metadata-driven compliance monitoring and logging, which natively records live video, audio, and all associated metadata from any point in the production chain. By capturing all broadcast and OTT content along with all captions and subtitles in a range of formats and in multiple languages, this system validates the presence and accuracy of necessary metadata at the point of distribution. Recorded and archived content provides a rich source of information for collaboration and assessment across the organization. Cloud-based AI capabilities such as STT transcription and video recognition empower operators to find content quickly and mine for media insights.

Real-world example: Compliance monitoring at a global 24-hour news network

This major network was looking to leverage closed captions to generate transcripts of aired programs and make them available for download on its website. Furthermore, the network wanted to make the captions searchable to allow its legal and engineering teams to locate high-value content quickly, generate clips, and export assets for FCC and ADA proof of compliance.

The logging and monitoring solution captures all metadata associated with the network’s aired content including loudness data, SCTE messages, DOLBY AC-3, as-run traffic and schedule logs, audio watermarks, and ratings data. An annotate feature allows users to generate notes, tags, and phrases, which are time-indexed to the recorded video. All metadata, whether extracted, calculated, or imported, are frame-accurately indexed back to the video. This integration between media and metadata gives operators perfect visibility into the entire broadcast and OTT experience.

In Summary

While STT-based transcript and caption generation is one of the most commonly deployed examples of AI and ML technologies, the sky’s the limit for these applications in terms of content creation and distribution. In addition to STT, microservices such as content classification, video intelligence, ad detection, and automated detection of logos, objects, faces and scene changes are ushering media enterprises into a new era. Newsrooms, live sports, and entertainment productions, post houses, and other media operations are able to expedite critical processes, reduce cost and improve efficiency. These capabilities enable intelligent and immediate logging and feedback of content quality and compliance, better positioning broadcasters to meet regulatory, compliance, and licensing requirements for closed-captioning, decency, and advertising monitoring.

In the end, success in the competitive media environment of today and tomorrow comes down to the ability to create and deliver the highest-quality, most compliant content in the shortest time possible. The best AI and ML tools are able to integrate, automate, and orchestrate critical functions such as closed-caption generation into the overall broadcast workflow.

[This is a contributed article from Digital Nirvana. Streaming Media accepts vendor bylines based solely on their value to our readers.]

Streaming Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

How to Improve the OTT Customer Experience Through Automation

Automation, machine learning, and artificial intelligence are vital components of any successful OTT service. Here's how to leverage them to improve processes, generate more revenue, and bring a better experience to consumers.

Investing in AI Is Just as Important as Content for Streaming Services

When it comes to giving viewers what they want and encouraging them to stick around, robust and effective AI strategies can play a crucial role

AI-Based Scaling as the Key to Cost-Efficient 4K UHD Content Delivery

AI-based Super Resolution can give viewers compelling UHD viewing experiences from a 1K-resolution source.

The State of Machine Learning and AI 2019

For events like the royal wedding and the World Cup, machine learning and AI are taking center stage.