Managing Video Technology Stack Costs: Is Your Data Really “Big”?
The explosive growth in demand for electronic media—particularly the increasing preference for cord-cutting alternatives for accessing video assets—suggests a need for those in the entertainment industry to look toward innovative solutions outside of the industry to manage their growing data management needs. Data capture, storage, and analytic support will continue to challenge video providers due to the growing size of the metadata-rich libraries of electronic media assets, the volume of historical data surrounding the diverse population of users who consume those assets, and the trends toward delivering personalized content. Though the tools and solutions leveraged by high-volume e-commerce and social media sites are candidates for meeting growing technology needs, the lessons learned by other industries in recent years might be used to better inform resource allocation decisions surrounding data management in the online video space.
Discussions about big data are often focused on the use of various software and hardware solutions intended to manage massive data sets. The oft-capitalized term "Big Data" has been focused on the tools and mechanics associated with manipulating raw data, rather than the value added by transforming raw data into actionable information to support an enterprise-wide strategy. As an alternative, the technical discussions could shift away from deification of "Big Data" and instead toward the formulation of an enterprise-appropriate data management strategy based on a clear identification of the knowledge one wishes to extract from the raw data. Surely, there is little value in building a robust data management plan capable of handling large data sets in a computationally efficient manner if the costly transformation fails to yield an ability to act upon the results. Any journeyman data analyst can build an infrastructure that can distill tera/petabytes of user-level data into a small set of metrics, but few have been able to transform these results into forward-looking prescriptions that might result in a more favorable outcome during future engagements. In this essay, a stepwise procedure for development and execution of a data management strategy is framed by a simple example: An enterprise goal of increasing site engagement by a target amount, e.g. "increase average engagement time by 10%."
It should be noted that the more detailed procedure presented here is only one of many possible approaches to achieving the goal. Yet, it is important that it begins with a "strategy first" approach rather than an initial focus on the challenges presented by the volume, variety, and velocity of data. Beginning with this proactive method, one may find that much of the effort and cost associated with the resulting strategy-based data management plan may be significantly small relative to a "data first" angle of approach. Additionally, our approach will be considerate of sustainability to build a data strategy that goes beyond a simple achievement of the goal and toward preserving it in the long run.
A carefully designed strategy will also have appropriate time horizons in mind; not only in terms of planning how forward-looking the actionable information needs to be, but how much historical data is meaningful to the analysis. In the example goal "increase average engagement time by 10%," this means that one must first determine;
- What is the unambiguous definition of "engagement time,"
- How long until the increase in engagement time must be realized,
- How long the extended engagement times needs to be sustained, and
- The perishability of historical data, if appropriate.
The strategic plan to extract actionable information that supports achievement of the target goal requires that these values be determined in advance. This will ensure that the strategic data management plan is appropriate, and that scarce resources will not be unnecessarily expended (e.g. if the historical data is considered obsolete after three months, then the resulting infrastructure should only be scaled to handle the computational loads that the reduced temporal window implies).
The 12-Step Program in the "Strategy First" Approach
- Plan: Develop a company strategic plan that includes target goals that reflects the foreseeable future lifetime of the company. Example: "Increase engagement time by 10%."
- Schedule: Set target date(s) for achievement of goals. Example: "Increase engagement time by 10% in the next 90 days."
- Define: Clearly define a set of metrics that appropriately measures progress toward the goal. In the ongoing example, this would include defining "engagement" and "engagement time." Example: Does engagement reflect time on the page, or time spent interacting with the video? Additionally, if the metric reflects a change in the value of some observed phenomena, a baseline must be established against which change is measured.
Scope: Identify the age at which historical data loses its value and should be disregarded for the measurable achievement of the goal.Repeat: Execute steps 1-4 for all goals in the enterprise strategy, including those that one foresees will be added at a later date as the demand on data processing scales according to the long-term strategic plan.Set Tasks: Identify the tasks required to achieve and measure the set of goals.Expect Action: Identify how the tasks will need to be modified to influence the path toward goal achievement if interim assessment indicates a potential shortfall.Collect: Identify the plurality of data (past, present and future) needed to operationalize the strategic tasks, measure progress, and maintain a historical archive that will allow any transformed data to be recreated upon demand.Summarize: Identify the data processing and storage needs required to support interim assessment and course correction.Evaluate: Conduct a comparison of various commercial/open-source/organic data analysis storage, processing, and measurement solutions needed to support the needs defined earlier—perhaps leaving some room for forecasting errors.
- Ensure that the metrics are measurable given available data and are unambiguously defined to prevent variations in their interpretation. This measurability constraint implies that some monitoring of the engagement must be in place and that there is an ability to measure the values of the defined metric during the establishment of the baseline.
- The baseline is the temporal window in which the metric is to be monitored and, if the sustainability of the goal is desired, the period beyond the target achievement date in which sustainability should be present.
Implement: Select and operationalize the suite of solutions.
- Sure, this may include Hadoop, Mahout, Cloudera or any appropriate toolkit—but only when it supports the enterpise strategy.
Following these 12 steps, the technology stack may be surprisingly parsimonious. It may also help avoid unnecessary costs associated with solutions that will not help the enterprise implement and measure its strategic plan and/or gain the necessary intelligence to make course correction. It should be noted that the infrastructure decision cycle is the inverse of the "data first" approach that would begin with step 12 and perhaps be devoid of any considerations of the enterprise strategy (if one exists at all). Finally, use the cost savings to buy idle hardware inappropriately selected myopically by others and then resell at markup to the next "Big Data"-fearing enterprise who appears.
[Thomas J. Sullivan is chief data scientist of IRIS.TV. Streaming Media accepts vendor-contributed articles like this one based solely on their value to our readers.]
Streaming services are collecting huge volumes of data on viewer preferences and histories, then using that data to drive new content production decisions.
Some of the biggest video publishers around are sitting on several years' worth of viewer data that they're only now beginning to sift through.