Join us in person this November for Streaming Media West in Huntington Beach, CA. Register now to save $100 off your pass!

Metadata: What You Need to Know (And Why You Need to Know It)

How Is Metadata Created?
There are two primary ways to create metadata (besides buying it): direct manual entry and automated extraction. We’ll look at these two ways, plus two variations: masked and repurposed metadata.

Manual entry
At a simple level, when you fill in your name, address, and other contact details for an online purchase, you are manually creating metadata that will be associated with the particular order. Fortunately, thanks to the consistent underlying HTML, XML, or database tags assigned to each field, autofill technology in most modern browsers has automated much of this field-filling chore.

Figure 3
Figure 3. ScreenPlay uses time-based metadata to catalog, deliver, and analyze promotional videocontent, including movie trailers, music videos, and video game trailers.

Manual entry of general metadata is easy enough for the casual or social media user: e.g., "Tigger’s birthday party" or "field trip to museum." For the world of commercial media, though, specific metadata is needed for every frame of video, and it’s very hard to create. Ideally, every sound and every image needs to be indexed, tagged, and cataloged. The good news is that there’s plenty of space to handle all the metadata types that one wishes to throw at the system: Current systems are capable of indexing 32,000 metadata fields per frame of video, or almost 1 million metadata fields per second of video.

The bad news is the brute force required to manually enter all that data. The sheer workload increase on employees required to enter metadata nearly guarantees limited success. A recommendation from a recent U.S. Geological Survey (USGS) posting on the topic probably says it best:

"How do we deal with people who complain that it’s too hard? The solution in most cases is to redesign the work flow rather than to develop new tools or training. People often assume that data producers must generate their own metadata. Certainly they should provide informal, unstructured documentation, but they should not necessarily have to go through the rigors of fully-structured formal metadata. For scientists or GIS specialists who produce one or two data sets per year it simply isn’t worth their time to learn the FGDC [Federal Geographic Data Committee] standard."

The USGS posting suggests that a better idea is for casual users to fill out a less-complicated form or template that is later flowed into the proper format by a "data manager or cataloger who is familiar (not necessarily expert) with the subject and well-versed in the metadata standard."

While this process or modified workflow relieves the average user from the burden of becoming a metadata expert and having to learn a complex combination of software tools and training, it still forces a bottleneck into the process in the form of the data manager.

Automated extraction
The Holy Grail of metadata entry is accurate, automated creation of video content metadata that can be generated by a software program that analyzes a clip in real time or faster-than-real time. The old stalwarts of automated ISR (indexing, search, and retrieval), such as Pictron and Autonomy Virage, have been automating media-based metadata with nominal success for more than a decade.

While the term ISR has been laid aside by the newer companies in the space, this automated indexing is now referred to as "machine vision" or "computer vision" to explain the process of extracting nonverbal information from a video or audio file. The business community is beginning to take notice of "computer vision" as it rethinks ways of both entering and classifying metadata.

Like Pictron and Virage, Digitalsmiths’ VideoSense automated indexing system uses facial recognition, scene classification, and object identification as "visual interpretation tools" to process each frame of video, using the resulting temporal metadata to match visitors with advertising content that meets its clients’ demographic and viewing criteria.

"Our variables go beyond common descriptors like name, title, and date to include criteria related to each frame," says Berry. "We can accurately index people, places, objects, dialogue, and subject matter. For example, automated scene classification provides general location details such as whether a scene took place indoors or outdoors and, if outdoors, whether at the beach or in the mountains."

Yet, even today, the accuracy range of automated indexing tools is far from perfect.

"We’ve worked with a variety of content," said Berry during a panel at the recent Open Video Conference, "and found we can get about 80%–90% precision rates for news speech-to-text conversion. Stepping down one level to non-news content, such as premium TV shows, and the precision rates drop down in to [the] 20% range."

To get around these limitations, all of the automated indexing systems also rely on closed-captioning text to augment their accuracy. With automated tools, it is arguably better to have too many false positives than not enough indexing, as more information actually increases the accuracy of text-based searches.

Masked metadata
Masked metadata is not metadata that’s hidden; it’s metadata created as a byproduct of another activity. As such, the act of creation is effectively "masked" from the creator and is often not considered to be an act.

When Gracenote originally began, as an open database, some fans saw it only as a massive, free labor source, happily downloading information about their own album titles and song tracks without providing any metadata entry in return. But other fans just as happily entered their own albums’ metadata into the database, which was then made available to other music fans.

Some contributors even entered metadata for albums they did not own and corrected others’ errors, thereby becoming known for their depth of knowledge about particular music genres. This manual process, which drove later automated retrieval of the same content for the mainstream iTunes user, was necessary to guarantee both accuracy and an initial critical mass. But the work was distributed across so many contributors—and was of such interest to many passionate fans—that it seemed less like dulling data entry than a labor of love.

One route that companies are taking to invite such labors of love is to create tools for metadata entry that mimic the interfaces of the relevant software. Digitalsmiths, for instance, makes metadata tools that mimic traditional video editing tools.

"We’ve got the ability to cut the tracks of metadata much like the way an Avid would cut different tracks of video," says Berry. "We also have the ability to create custom metadata tracks, such as genre, since our studio clients have shown us that a genre may change from scene to scene within a single movie."Chris Jackson, founder of London-based MetaBroadcast, notes that URIplay, an open source project backed by the BBC, started as an interface project but has ended up as a metadata play.

"Our core goal is to help people find moving images," says Jackson, "but we’ve found we had to move well beyond user interfaces to the creation of effective metadata tools and interfaces."

Repurposed metadata
This is metadata created with one purpose but used for many others. As previously mentioned, much of the metadata created to index and find content can also be used to restrict access to the same content. That, however, is much more simplistic than the real power of repurposed metadata.

Over the past 12 years, I’ve worked in metadata optimization, emphasizing production, postproduction, and delivery workflows (some of which is chronicled at www.workflowed.com). While content creators often enter detailed metadata at the time of production, I’ve found that typical nonlinear postproduction software strips all but six or seven fields of the metadata gathered in acquisition. When we add metadata in the editing process, it too is stripped out when the content is sent to physical media (DVD or tape) or even to a streaming file format. The frustration is that, until recently, there has been no significant movement in addressing these issues, leaving editing systems as metadata "islands" unconnected to either production or distribution.

That situation seems to be changing, slowly. Companies creating editing tools are morphing into end-to-end solutions, encompassing production, editing, and distribution, with the potential to maintain metadata throughout the process—metadata that can be repurposed to save time in postproduction and delivery through the ingestion of preproduction and production metadata.

Streaming Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

DigitalSmiths Releases VideoSense 3.0; Metadata Geeks Rejoice

User interface optimized to let users drill down through customized time slices