Review: Subjectify.us, a Site for Subjective Video Comparisons
While objective quality metrics like peak signal-to-noise ratio, Video Multimethod Assessment Fusion (VMAF), and SSIMPLUS provide helpful data for encoding professionals, the gold standard for video quality will always be subjective comparisons made by human eyes. The problem has been that subjective comparisons are time consuming and expensive to produce.
To help simplify subjective comparisons of still images and video, Moscow State University (MSU) has launched a website/comparison service at www.subjectify.us. By way of background, MSU is the highly respected publisher of H.264 and HEVC comparisons and the developer of the Video Quality Measurement Tool (VQMT).
To use Subjectify.us, you create your files in your lab or studio, convert them to data sets, and upload them to the website along with configuration information about how to perform the comparisons (more on this later). The service recruits participants who watch and rank your videos, collects the data, and provides you with detailed and configurable plots for analysis and presentation
Note that the platform can be used for both still image and video comparisons, and that valid uses extend beyond the analysis of codecs and compression techniques. For example, MSU reports that Subjectify.us has been used to compare video matting (i.e., foreground object extraction) techniques, background reconstruction methods, color correction methods, saliency-aware video compression, image inpainting methods, and image scaling methods.
The service costs $1 per study participant for academic/noncommercial usage and $2 per study participant for commercial usage, with discounts available for high-volume users. That cost includes comparison of up to 10 video pairs per participant.
The service launched in beta form in January 2018 and is now open for business. To try Subjectify.us, we ran three sets of tests. The first compared H.264, HEVC, and VP9 using two sets of files at five different data rates. The second compared four sets of files encoded using 200 percent constrained VBR and CBR at the same data rate.
The third set of tests was more complicated. Here we tried to identify the “switch points” in an encoding ladder, or at which data rates the ladder should switch from one resolution to another. For example, in Figure 1, which computes switch points using VMAF, the cells with the green background have the highest score at each data rate. This shows that you should switch from 1080p to 720p at 1400Kbps, and from 720p to 540p at 600Kbps. To test if VMAF accurately predicted these switching points, we bracketed each switch point as suggested by VMAF—for example, testing both 1080p and 720p at 1400Kbps, 1600Kbps, and 1800Kbps to identify the highest quality file.
Our third test case involved finding the switch points between different video resolutions in an encoding ladder.
The initial project setup is straightforward. When you create a project, you name it and specify whether you’ll be comparing images or videos. If video, the service recommends keeping the files short, and our test videos were between 15 and 20 seconds long.
Next, you drag your test files into a project data window (Figure 2), which is where things get complicated. Folder structure and naming convention are critical here to achieve the target results with some projects. For example, in our simplest test (CBR vs. VBR), we could name the file filename_technique, or BBB_CBR for Big Buck Bunny encoded using CBR. This would allow us to present the data for all files, or for each file separately.
Dragging files into Subjectify for uploading
Setup for the other tests was more complex. For the codec comparisons, we needed to differentiate data rates so we could preset the charts in the standard rate-control style. To accomplish this, we uploaded the files in separate folders for each test file and used a label like the one at the end of this sentence to designate both the codec and data rate. H264@datarate=5MB.mp4
The ladder test was the most complex. Here we needed to present the results for each switch point by data rate and resolution. So we created folders for each switch point for each file and used this naming convention: 540p@ datarate=600.mp4. This allowed us to present the data as shown in Figure 3 (on the next page), which reveals that according to our viewers, 720p looked better than 1080p at all tested data rates for this clip.
We needed precise folder- and file-naming conventions to plot results like these. In our 1080p > 720p switching test, participants preferred 720p at all data rates.
Next, you specify project settings, which includes the task description sent to the people that Subjectify.us recruits to perform your tests. Our description stated, “From each pair of videos choose the video with the best visual quality.” You also get to choose whether the participants compare all videos in a folder against the other, or compare all against a reference video. We chose the former.
Here again, folder structure controls a lot. For example, if we only wanted to compare the codecs at each specific data rate, we would have uploaded the files in separate folders for each data rate. If we wanted to compare all codecs at all data rates, as we did, we uploaded all five data rates for each codec in a separate folder for each test file.
Other configuration options include whether testers see the videos side-by-side or in sequence, how many videos a participant can compare in a session, and the number of verification questions the participant sees each session. To explain, the verification question ensures that each participant is actually carefully comparing the test cases and not just randomly responding. It’s a slam-dunk test case where the participant should always choose one video over another. Our verification question compared H.264-encoded files at 2Mbps and 6Mbps; if any participants designated the 2Mbps file as the higher-quality option, their responses were disregarded and they could no longer participate in Subjectify.us testing.
The Participant Interface
Once you finish your configuration, Subjectify.us recruits the participants, which at this point can only be desktop viewers; they cannot use mobile or other connected devices. To ensure a consistent experience, Subjectify asks participants to use Chromium-based web browsers in full-screen mode, and won’t display the video if Chrome is not in full screen. I tried running the tests in Firefox, and it didn’t work, so Subjectify does appear to enforce both requirements.
Moscow State University's Video Quality Measurement Tool was already good. Enhancements in the new version, including new metrics and the ability to run multiple analyses simultaneously, make it even better.
Latest codec quality comparison study finds AV1 tops on quality, but far behind on speed, and that VP9 beats out HEVC. Moscow State also launched Subjectify.us, a new service for subjectively comparing video quality.
How can video compressionists assess the quality of different files? Only by combining objective mathematical comparisons with their own professional judgments.