Google Releases Dataset of 8M YouTube URLs for Video Research
To aid video researchers and students, Google has released a dataset of 8 million YouTube video URLs representing over 500,000 hours of video. The largest such video pool by far (the next largest is a dataset of 1 million sports videos, Google notes in its announcement), this should prove a great help to researchers in such areas as video modeling architectures and representation learning.
"We believe this dataset can significantly accelerate research on video understanding as it enables researchers and students without access to big data or big machines to do their research at previously unprecedented scale," software engineers Sudheendra Vijayanarasimhan and Paul Natsev wrote.
Since most researchers won't have the massive storage and CPU capabilities to handle such a dataset, Google pre-processed the videos and extracted one video frame per second from 1.9 billion video frames. It then compressed the set so that it fits on a 1.5 TB drive.
The dataset is comprised of public videos with over 1,000 views. Objects identified in the videos were tagged using frequency analysis, automated filtering, and verification by humans. Videos are grouped into 24 top-level verticals (as shown in the image below). For more details on how Google created the set, read its technical report.
Google takes ownership of a three-year-old startup that provides a self-service marketplace where companies can hire video influencers.
Brands can now get their messages on the hottest clips of the day, no matter what those clips happen to be, with Google Preferred.