VP9 Finally Comes of Age, But Is it Right for Everyone?
The value proposition for VP9 is clear, as stated in Figure 1: “Adaptive HD streaming with 1/2 the data of H.264!” Half the data rate cuts your bandwidth and storage costs and allows you to reach more viewers with better quality video on slower connections. It also cuts your customer’s monthly data costs, a major issue now that many ISPs are instituting monthly bandwidth caps.
Many producers are starting to explore the benefits associated with distributing VP9-encoded video to desktop/notebook browsers and some mobile platforms instead of H.264. While you won’t be able to wean yourself off of H.264 entirely for many years yet, VP9 delivery is definitely a concept whose time has come. In this article, I’ll compare multiple aspects of each, from encoding to delivery to player creation. Beyond this, I’ll briefly touch on the IP situation and conclude with a brief mention of AV1, the codec that will soon replace VP9.
VP9, WebM, and DASH
Let’s start with a brief overview of VP9, which is an open source codec from Google, courtesy of its purchase of On2 Technologies in 2010. The first codec open-sourced by Google was VP8, which was paired with the Vorbis audio codec in the WebM file structure, which was based upon the Matroska media container.
VP9, the next iteration of the codec, was first introduced in mid-2013 and first deployed by YouTube a few months later in September. Also in 2013, the WebM format was expanded to incorporate the Opus audio codec, which is most often paired with VP9. You can deliver a single file containing VP9 and Opus in a WebM file or produce multiple VP9/Opus streams in DASH packaging for adaptive streaming.
Figure 1. JW Player supports VP9 in its eponymous player and online video platform service.
VP9 is the last iteration of VPx, as Google formed the Alliance for Open Media in September 2015 to consolidate open source codec development with Mozilla, Cisco, Microsoft, Intel, and others. The first alliance codec, called AV1, should be released sometime between December 2016 and March 2017, which I discuss later in the article.
With this, let’s start our look at VP9 by examining file quality.
Same Quality at Half the Data Rate? Close!
To test quality, I encoded three video clips at three different resolutions at five different data rates. The first clip is a short segment from Blender Foundation’s Tears of Steel (TOS) movie, representing mostly traditional movie content. The second is a short segment from Blender’s Sintel movie, representing animated content, and the third, which I call the New clip, is composed of multiple clips to simulate real-world video. Working in Adobe Premiere Pro CC, I produced very high data rate H.264 mezzanine clips in three resolutions with the same horizontal pixel count, but a vertical count that varied by clip. For example, the smallest New clip was 1280x720, while the Sintel/TOS clips were 1280x576. These mezzanine clips were the starting points for all encodes.
I used FFmpeg for all test encodes. For the VP9 encodes, I used the VOD recommended settings from the WebM Wiki, changing both the speed and frame parallel to 0 at Google’s recommendation. For x264, I used two-pass encoding with the veryslow preset, with maxrate and buffer set to 150 percent of the target data rate, essentially 150 percent constrained VBR with a bit of wiggle room in the buffer. The Keyframe interval was set to 3 seconds for all tests.
I encoded each clip at five different data rates, which varied by clip. As an example, I encoded the 1920x856 Sintel/TOS clips at 1000Kbps, 1500Kbps, 2000Kbps, 2500Kbps, and 3000Kbps. In all tests, this left two comparison points for each clip in which VP9 was encoded at 50 percent of the data rate of H.264. For example, with the 1920x856 TOS clip, I could compare VP9 at 1000 with H.264 at 2000, and VP9 at 1500 with H.264 at 3000 (Figure 2). Since there were two comparison points for each of nine test clips, that meant there were 18 total comparison points to test the “same quality at 50 percent of the data rate” premise.
Figure 2. VP9’s quality consistently and substantially exceeded H.264’s over all tested data rates.
To test quality, I used the Peak Signal-to-Noise Ratio (PSNR) metric, calculated using the Moscow State University Video Quality Measurement Tool. In four of the 18 cases, the PSNR score of the VP9 video was higher than the H.264 video at twice the data rate. If I added a 5 percent tolerance by multiplying the H.264 score by 0.95, VP9 won in 14 of 18 test cases. At 10 percent tolerance, all VP9 files exceeded the quality of their H.264 counterparts. As you can see in Figure 2, the quality differential was fairly consistent along the entire data continuum.
Codec benchmarking is an inexact science, and all comparisons usually result in more questions than answers. To supplement my trials, I spoke with JW Player’s lead compression engineer, Pooja Madan, who designed and implemented JW’s VP9 encoding facility. She advised that the company was achieving about 50 percent savings overall on its encoding ladder, hence the claim shown in Figure 1. Most other companies I spoke with while writing this article reported similar results.
But You’ll Need Lots of Time
The Achilles’ heel of VPx-based codecs has always been encoding time, and VP9 is no exception. I produced all performance numbers discussed later on an HP Z840 workstation with dual 3.1 GHz E5-2687W processors with 10 cores each, and Hyper-Threading Technology (HTT), for a total of 40 cores. All source files were stored on and encoded files delivered to Turbo SSD G2 drives. This is an extremely fast and capable system that you can read about in a series of benchmarking reviews in Streaming Media Producer.
On the Z840, encoding the 96-second 720p New file into H.264 format at 2Mbps took 98 seconds. Encoding with the Google-recommended parameters delivered the same file in VP9 format in 19:10, about 12 times longer. When I interviewed JW Player for this article, Madan reported that the company had invested significant time to achieve the optimal balance between encoding time and quality.
I asked if she would share her FFmpeg scripts to use in my testing, and she (and the company) agreed. The results were impressive. Specifically, the JW Player command line script produced the 720p New file @ 2Mbps in 4:59, about 22 percent of the time of the Google script. I checked the PSNR value for the JW Player-encoded file against the file created using Google recommended parameters, and the JW score was about 5 percent higher.
During encoding, I tracked CPU utilization and noticed that when rendering a single file, VP9 barely moved the needle. I decided to check performance with multiple encoders running. To accomplish this, I created multiple folders on the two SSD drives and ran each of the command line scripts eight and then 12 times simultaneously. As you can see in Table 1, multiple encodes reduced the H.264 to VP9 encoding differential with the JW Player script down to less than 4 seconds per file when encoding 12 files simultaneously.
Table 1. Encoding times for H.264 and two VP9 test scripts.
These results raise two questions. The first asks why Google would produce a codec that performed so poorly on multiple-core workstations. The answer is because this performance model fits perfectly into its current encoding schema. That is, Google doesn’t encode each input file from start to finish in a single encoding instance; it encodes all files in parallel, splitting each source into chunks and then sending them off to different encoding instances.
In the context of Google’s encoding system, VP9 isn’t slow at all. Heck, at 40.5 seconds per file, it’s only about 10 percent percent slower than H.264. However, this performance schema puts the burden on the developer to create an efficient encoding program or platform. Of course, VP9 isn’t the first codec that fails to efficiently leverage multiplecore systems, and encoding programs such as Telestream Episode have long used a technique called split and stitch to improve performance on multiple-core systems. Much like YouTube’s parallel encoding schema, split and stitch divides a single input file into multiple parts encoded separately in parallel, and then stitched back together for final output. Essentially, the 12-simultaneous encode test shown in Table 1 simulates this parallel encoding operation, which pushed the Z840 to 98 percent+ usage (Figure 3).
Figure 3. Twelve simultaneous VP9 encodes pushed the Z840 to the max.
VP9 isn’t slow; it’s just highly inefficient in a multiple-core environment, which makes it tougher for developers to design encoding systems that operate efficiently. For this reason, expect to see substantial differences in encoding times (and quality) in programs that support VP9. Developers creating their own encoders should design their architecture from the start knowing that they’re going to have to deploy a system like Google’s to maximize VP9 encoding time and efficiency. This is especially true in the cloud, where you pay for an instance by the hour, and the ability to spread encoding chores over as few CPUs and CPU-hours as possible translates directly to the bottom line.
And You’ll Have to Experiment
Here’s the second question: If JW Player’s script produced both better quality and a faster encoding time than Google’s recommendations, why didn’t I use it for my quality tests? JW Player’s script uses a technique called capped common rate factor (CRF), which tells FFmpeg to deliver a file at a certain quality level but capped at a certain bitrate. For example, the JW script used a CRF value of 30 and a data rate of 2Mbps. This tells FFmpeg to encode the file to a quality level of 30, but to cap the data rate at 2Mbps.
If a file is easy to compress, such as a talking head clip, the entire clip will likely be well under 2Mbps. If the clip is hard to compress, the entire clip will likely be capped at 2Mbps. So when encoding with capped CRF, the bit rate, and the resultant file size, will vary from clip to clip depending upon content.
This presents a challenge when comparing codecs, because it’s always necessary to check output file size to ensure that the encoder met the target data rate. Obviously, this can’t be done with capped CRF-encoded files. I didn’t think it was appropriate to use a capped CRF approach for benchmarking, though I certainly would consider it for production. If you’re designing your own system, be sure to incorporate this feature, and if you’re buying an encoder, verify that capped CRF is available before purchase.
Finally, for those encoding their own VP9 files, Madan was kind enough to share her top four VP9 encoding takeaways:
- Use two-pass encoding; one pass does not perform well.
- With two-pass encoding, generate the first pass log for the largest resolution and then reuse it for the other resolutions. VP9 handles this gracefully.
- While VP9 allows much larger CRF values, we noticed that CRF < 33 speeds up the encoding process considerably without significant losses in file size savings.
- You must use the “tile-columns” parameter in the second pass. This provides multi-threaded encoding and decoding at minor costs to quality.
I’ll add that you shouldn’t assume that Google’s recommendations are the best practices for your needs. The bottom line is that you may have to invest substantial time to create the optimal mix of encoding time and output quality.
You’ll Have a Growing List of Encoding Options
As of NAB 2016, you should have more options when choosing an encoder, both on-premise and in the cloud. On the enterprise-encoding front, Telestream announced that it would incorporate VP9 encoding into the Vantage Transcode Multiscreen program by mid-2016. Exploring what motivated Telestream to integrate VP9 into Vantage 3 years after the codec became available, I asked Paul Turner, VP of enterprise product management, “Why VP9, and why now?” He says, “New codecs show up all the time; as an encoding supplier, we look at all of them but don’t instantly add them to our products. At this point, we believe VP9 has legs and is a viable alternative for customers who have been encoding in H.264 and other UHD codecs.”
YouTube now endorses phones that provide the best streaming experience on its platform, and VP9 codec support is an essential.
A conference question asks when it makes economic sense for a video provider to add a VP9 transcode? We scoped out the variables and did the math.
Apple Safari has never supported VP8 or VP9, but a move by Google's YouTube pressures it to do so: Safari users can't see new 4K videos on YouTube.
VP9 is the open-source codec from Google, and provides a royalty-free alternative to HEVC. It's more efficient than H.264, and while it's less efficient than HEVC, it compares well on quality.
Once counted out, VP9 is on the rise, with support from Netflix, JW Player, Brightcove, and more. In this interview, David Ronca of Netflix talks about VP9 savings, encoding, and testing.
Our panel dives into major codec and format questions, helping you decipher which technologies are here to stay and which should be forgotten.
Companies and Suppliers Mentioned