Taking Control of Big Data: The Challenge of M&E Data Management
In the world of media and entertainment (M&E), digital, file-based infrastructures are rapidly becoming more and more prevalent, and consequently, the need for storage capacity is growing at a phenomenal rate. In this space, the term “big data” is typically used to describe the vast file-based high-resolution media collections that media organizations and content owners must manage. In most cases, these big data sets are comprised of a few hundred thousand to tens of millions of media assets, each of which could contain hundreds of thousands of elements that are eachsignificant to the integrity and recoverability of the media asset itself. Each one of these high-resolution media assets typically occupies a couple hundred megabytes up to several terabytes of storage. Given the scale of asset sizes and the number of fundamental elements within each asset, it’s easy to see why content storage and management is such a challenge in these environments.
A common yet inefficient way to store and manage big data is in digital storage “silos” that serve various parts of the workflow independently. In a broadcast situation, for example, postproduction, on-air playout, graphics, and the newsroom -- and their domain-specific tools -- all have separate storage silos. It’s akin to storing word processing documents on one server, spreadsheets on another, images on a third, etc. -- totally inefficient, but in many media environments still a pervasive approach and largely considered an acceptable but aging practice. If media facilities expect to facilitate next-generation, file-based collaborative workflows in an ever-changing media dissemination and consumer landscape, then they have to unify these distinct digital silos in a way that’s sustainable and scalable.
How can they do it? With a combination of storage abstraction and modern, media-centric cloud and on-premises object storage systems. Individually, these middleware solutions make big data storage and management much more efficient and effective. Together, they form an even more powerful approach called content storage management.
What Is Storage Abstraction?
Storage abstraction can be seen as a software layer that sits between the sources and consumers of files (or content) and the various amounts, types, and generations of physical storage devices. At the most basic level, a shared network drive can be seen as a storage abstraction layer. Regardless of the physical type of storage (spinning disk, flash disk, datatape libraries, cloud, etc.), you can store and restore files seamlessly to and from this location. Via software, the storage abstraction layer takes care of all the complexities related to the actual storage and recovery of files; provides a familiar user interface for access; and handles protection, disaster recovery, etc. Typically storage abstraction solutions allow the use of commodity IT-centric storage in very application-specific environments.
The storage abstraction layer also takes care of seamless scaling of the underlying storage infrastructure so that you don’t lose files as you add more storage, perform upgrades, or migrate to different technologies. Another benefit of storage abstraction for M&E in particular is the fact that these systems also include native interfaces into the devices that produce and consume content rather than relying on other systems or applications to bridge this chasm. By natively abstracting all assets, regardless of whether they exist on commodity storage or on application-specific devices, asset owners can avoid complex integrations of different application layers to address their big data challenges.
What Is Object Storage?
Object storage technologies, commonly known as object stores, are typically software middleware or management solutions that can exist independently or as part of a storage abstraction layer solution. One of the main challenges with big data is the unpredictability of the data sets (hence the term “unstructured data”). It is easy to store and restore simple PDF and Word documents (“structured data”), but when an asset can be comprised of some random combination of file and folder hierarchies, binary data, imagery, metadata, etc. and be subject to change over time, then simple file-based storage approaches are just not viable. Furthermore, it is typically the complex combination of all these component elements and their relationships that form the media asset itself. Relying on a simple file/directory hierarchy to arrange, manage, preserve, and protect complex, unstructured assets is simply not feasible.
Object stores can unify the mix of content types, workflows, interfaces, and other ancillary technologies rampant in media organizations today into a single harmonious, scalable, future-proof asset repository and file-based facility backbone that serves many current and evolving masters.
Storage Abstraction + Object Store + Content Awareness = Content Storage Management
Married with a core object store approach, storage abstraction middleware provides a flexible, infinitely scalable, replicated, distributed, and robust storage and preservation backbone in demanding M&E big data environments.
Content storage management (CSM) middleware solutions represent the true culmination of storage abstraction and object store and the concepts, technologies, and approaches behind them for handling unpredictable and challenging big data-centric environments of any scale. Adding content awareness to the mix allows these systems to handle, interpret, and transform these media assets accurately to allow seamless sharing across workflow silos while serving each in a native fashion. Grown out of decades of experience in the demanding big data world of M&E, CSM solutions serve any applications and devices that generate or consume metadata, content, or other file-based assets and can adapt over time as new formats and challenges arise.
CSM in the Cloud
M&E operations tend to be abstract, dynamic, and flexible, and CSM solutions are built with that environment in mind. They can exist on-premises as well as in any mix of private, hybrid, and public clouds. Applying CSM solutions in the cloud simply extends their effectiveness and alleviates capital expenses and staffing expertise required to maintain on-premises infrastructures. They can also help serve as a bridge between the facility and the cloud, allowing elastic moves between the two as needs change over time.
Streaming services are collecting huge volumes of data on viewer preferences and histories, then using that data to drive new content production decisions.
Some of the biggest video publishers around are sitting on several years' worth of viewer data that they're only now beginning to sift through.