Review:, a Site for Subjective Video Comparisons

Article Featured Image

If Chrome supports playback of the uploaded content, as it does VP9 and H.264, Subjectify downloads the actual test files to the viewer’s computer before the tests start. With unsupported content like HEVC, the service converts these files to H.264 using constant rate factor (CRF) 16, which should deliver sufficient quality to avoid prejudicing the results.

Participants download the files, view the instructions, and start watching the videos. The videos are presented without playback controls, so there’s no way to stop the video to inspect. There’s also no timer or countdown clock. After each video pair, participants are presented with the screen in Figure 4.

Here’s where the participants make their selections.

Depending on the test, you should start to see results within a few hours, and most tests should be finished within three days.

The Results, Please

In the project selection screen, you can see how many participants rated your videos, how many failed, and how many selections they made. In the CBR vs. VBR test, there were 149 total participants, 3 of whom failed, with 1,752 rated comparisons.

The presentation of results varies according to the test parameters. In the CBR vs. VBR comparison, which was our simplest test, we could sort the results by test clip or overall, display the results using the Bradley-Terry or Crowd Bradley-Terry models, and display a confidence interval from 85 percent to 99 percent relative to either parameter or ground truth.

The concept of “ground truth” relates to studies that compare the various alternatives to a super-high-quality option, which we didn’t use. The Bradley-Terry model is a probability model that attempts to predict the outcome of a comparison. In this particular test case, the result is clear (Figure 5); the results predict that most viewers will prefer the 200 percent constrained VBR files (the top line extending beyond 5) over CBR, but the difference isn’t overwhelming.

Overall, 200 percent constrained VBR (the top line) won out over CBR.

Digging into the details, the overall comparison is greatly impacted by the results from Big Buck Bunny, which scored 200 percent constrained VBR at about 6.2 and CBR at 4.70. The results for the real-world video files were neck and neck, with the participants rating CBR higher quality than VBR for Zoolander. Note that you can download all reports as JPEG or PNG images, SVG vector images, and PDF files.

Codec Comparison

We had to run our codec comparisons twice because during our initial run, we noticed that the poor quality of the initial frames of the HEVC-encoded videos might be influencing the score. We reprocessed all the files to eliminate these frames and ultimately collected data from 923 participants who rated 11,076 video pairs.

The results in Figure 6 are grouped by data rate to show what looks like the standard rate-distortion curve for both test videos. The results show VP9 slightly ahead of HEVC through 4Mbps, a reversal at 5Mbps, and then near parity at 6Mbps, where all three technologies were tightly grouped, with H.264 enjoying a slight advantage.

VP9 edged HEVC at most data rates in this comparison.

In addition to presenting overall results, you could also present results for each clip. As you can see on the right in Figure 6, beyond the graphic export options available for all tests, when the data set includes variables like data rate you can also export as CSV or XLS files, view the data table, or open the chart in Highcharts, a cloud service for advanced charts and graphs.

The Value of Subjective Evaluations

We all have preconceived notions regarding video quality that dictate many of our encoding decisions—some grounded in objective metrics, some in our own subjective evaluations. For years, I’ve argued that the quality difference between CBR and VBR was much less than most compressionists believe, and Subjectify. us results bear this out. As mentioned above, for the three real-world video files, the quality difference was minimal; only the animated file showed a significant advantage for VBR. Given that CBR-encoded files are more reliable to deliver over limited-bandwidth conditions, this was a useful data point.

Conversely, my VMAF-fueled notions regarding switching points were proven wrong by the results. Intuitively, at the same data rate, a 1080p video will deliver more detail but potentially more compression artifacts, while a 720p video will deliver less detail and fewer artifacts. My credo has been that higher resolutions almost always win at the relevant switch points.

While likely true for animated files (which we didn’t test), this was soundly refuted by the results for American football and a concert video. Armed with these results, I can experiment with different metrics to identify which does the best job predicting subjective results. In particular, SSIM seems to favor low-resolution videos over high-resolution videos; it will be interesting to replot the switch points with SSIM and see if the results match the predictions.

As I said at the top, while objective metrics are always useful, subjective evaluations are the gold standard. provides an affordable and useful way to access them.

[This article appears in the October 2018 issue of Streaming Media magazine as "Review:"]

Streaming Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Moscow State Updates VQMT With New Metrics

Moscow State University's Video Quality Measurement Tool was already good. Enhancements in the new version, including new metrics and the ability to run multiple analyses simultaneously, make it even better.

AV1 Beats VP9 and HEVC on Quality, if You've Got Time, says Moscow State

Latest codec quality comparison study finds AV1 tops on quality, but far behind on speed, and that VP9 beats out HEVC. Moscow State also launched, a new service for subjectively comparing video quality.

Video Quality Measurement Requires Objective and Subjective Tests

How can video compressionists assess the quality of different files? Only by combining objective mathematical comparisons with their own professional judgments.