Review: iSize BitSave Video Preprocessing
iSize BitSave is a video pre-processing technology designed to improve the quality of the video that you produce. I'll start with a pithy summary, and then circle back for the deep dive.
iSize's website claims that it delivers a 40% to 60% bitrate reduction, so I tested at 60% the data rate of the other test approaches. Since it's simple to "hack" VMAF to deliver improved scores that may not correlate with subjective video improvements, I created my own FFmpeg-based hacking algorithm to compare to BitSave.
I produced three sets of files using 17 20-second test files, including movies, animation, sports, and gaming. One set of files was the source file encoded with no processing (Baseline). The second were the files processed using the FFmpeg approach (FFmpeg Filters), and the third processed via BitSave (BitSave). I encoded all files using the same FFmpeg parameters and measured VMAF, SSIM, and PSNR. I did not tune for PSNR or SSIM because I performed all tests with the same codec and encoder.
Figure 1 shows the resulting metrics scores, with PSNR averages multiplied by 2.5 and SSIM averages multiplied by 100 to display them in the same graph as VMAF. To be clear, the Baseline results and FFmpeg Filters are at 100% the target data rate while BitSave is at 60% that target.
Figure 1. BitSave compared to my own FFmpeg filters and the Baseline results.
As you can see, the FFmpeg filters (at 100% the data rate) delivered the best VMAF improvement by a smidge but the worst PSNR and SSIM scores. The unprocessed Baseline files delivered the best overall scores for PSNR and SSIM, with BitSave second and the FFmpeg approach third. BitSave (and, indeed, all video preprocessing companies) attribute this drop in SSIM and PSNR scores to "the fact that we are altering the input to the encoder, so that our video signal is tuned towards human perception." In other words, we've stated for years that PSNR and SSIM scores don't correlate with subjective scores, so you shouldn't expect a preprocessing algorithm that improves perceptible quality to improve these scores.
Figure 2 shows the same analysis using 11 5-second test clips with BitSave encoded at the same data rate. Here, BitSave won the VMAF comparison and posted higher SSIM and PSNR scores than the FFmpeg filtering approach.
Figure 2. BitSave compared to my own FFmpeg filters and the Baseline results at 100%.
I share subjective observations below. My conclusions are as follows:
- VMAF is, indeed, very easy to hack, and you should take VMAF scores with a grain of salt when comparing pre-processing approaches.
- BitSave is a legitimate processing technology and not a hack, though it clearly benefits from VMAF's "hackability" in that it improves the contrast of most videos.
- The only way to legitimately compare pre-processing algorithms is with subjective testing.
- Applying the FFmpeg filters I formulated boosts VMAF scores and improves the contrast of most videos, but probably shouldn't be considered for universal deployment, particularly with premium content, because it does darken some videos. If you're working with low-quality input videos, though, it's worth a look.
Now let's look at how I tested.
The key performance indicator claimed on the company's website is "When iSize Technology is used with MPEG H.264/AVC, it allows for 40-60% reduction in bitrate* over a range of encoders (AVC/H.264, HEVC, VP9, AV1, VVC), versus when using the same baseline encoder and encoding recipe – all with the power of our proprietary machine learning IP."
I searched for the asterisk on the website and found no caveats. After I completed my testing, I shared my results with iSize who responded, "just to clarify, we state the 40%-60% saving over a range of encoding standards (AVC/H.264, HEVC, AV1, VVC) and over a range of encoding recipes, versus when using the *same encoding recipe without any additional perceptual preprocessing* for the codec (like the choice of unsharp/contrast for your 'FFmpeg Filter' tests." A week later iSize still hadn't clarified this on their website, so I'm assuming that others reading this claim on their website would reach the same conclusion, and I'm reporting what I found.
How BitSave Works
You can read how BitSave works here. From a workflow perspective, I accessed the technology by uploading test files to the company's website and then downloading a high-bitrate HEVC file to use instead of the original source. This is cumbersome for high-volume use, of course, so iSize offers a number of licensing and integration options.
I started testing by encoding both the original source and the BitSave pre-processed files to the same encoding parameters. This confirmed significant improvement in VMAF scores though PSNR scores were generally down and SSIM scores about even. When I studied the videos directly, I noticed that some frames encoded from the BitSave source exhibited improved contrast with a touch less haze than video encoded from the original source, as if they had been processed by a colorist in a separate round of color grading.
This raised several questions like, whether I could improve VMAF scores by improving the contrast of encoded video files, or whether VMAF could be hacked. A quick Google search of "Hacking VMAF" revealed a white paper called "Hacking VMAF with Video Color and Contrast Distortion," which was co-authored by several researchers who work with Moscow State University, the developer of the Video Quality Measurement Tool and the producer of multiple codec comparisons.
The paper states, "In this paper, we describe video color and contrast transformations which increase the VMAF score while keeping the SSIM score the same or better. The possibility to improve a full-reference metric score after adding any transformations to the distorted image means that the metric can be cheated in some cases." In the white paper, the researchers tested how different values of unsharp mask and histogram equalization impacted VMAF and SSIM scores. As the title of the article suggested, they concluded that VMAF could be "hacked."
I took this as a challenge and started experimenting with several FFmpeg commands to learn if I could boost VMAF scores and perhaps even the actual quality of the video as well. Specifically, I applied the contrast and unsharp mask filter, adding the following command string. The only two numbers that I adjusted were contrast and the third unsharp mask configuration, both of which have a default of 1.0.
Then I adjusted contrast to the values in the table below, which netted the metric scores shown. As you can see in Table 1, bumping contrast improved VMAF scores by as much as 14 points though SSIM suffered as a consequence. Clearly, VMAF can be hacked.
Baseline (no adjustment)
Table 1. Hacking VMAF with contrast adjustments.
At the extremes of these values, the VMAF scores were great, but the darkening effect on the video was more than any producer would actually use, and went far beyond what I saw in the BitSave videos. Since my goal was to find a set of values that any producer could confidently use at any time, I backed off the contrast values and ultimately settled on these settings.
As any FFmpeg aficionado (or compressionist for that matter) knows, these types of optimization experiments are exceptionally time consuming and I can't claim that these settings (or even these filters) are the optimal approach. But, as you'll see in the results section below, they did boost VMAF scores and in most cases also appeared to boost actual video quality, as well.
So, to test the BitSave technology, I used 17 clips, each about 20 seconds long, with four animations, four games, four sporting clips and six other clips including some movies and other content. I processed these through the BitSave website and encoded using a simple two-pass FFmpeg command string.
ffmpeg -y -i input.mp4 -c:v libx264 -b:v 6000k -maxrate 12000k -bufsize 12000k -g 50 -keyint_min 50 -sc_threshold 0 -pass 1 -an -f mp4 NUL && \
ffmpeg -y -i input.mp4 -c:v libx264 -b:v 6000k -maxrate 12000k -bufsize 12000k -g 50 -keyint_min 50 -sc_threshold 0 -an -pass 2 input_x264_baseline.mp4</code?
I produced three files for each test:
- The _baseline file using the script on the original source file.
- The _BS file using the script on the BitSave processed intermediate file
- The _both file using the above script plus both video filters shown above on the original source file. This file corresponds to the FFmpeg Filters output.
I changed the keyframe interval to match the frame rate of each file, which ranged from 24 to 60 fps. I customized the data rate for each test by running a CRF 27 encode on each source file to identify a data rate that would produce a VMAF value of about 93-95. I encoded the _baseline and _both files at that rate. I encoded the BitSave file at a 40% lower bitrate, which is a very demanding test though reasonable to test iSize's marketing claims.
Note that I didn't tune for PSNR or SSIM in my encodes because I was using the same codec for all encodes, so I figured the effect on the metric scores would be even. I also had a strong feeling that the visual comparisons would be the most important and tuning might have thrown off the subjective results.
Table 2 shows the benchmark results with separate columns for VMAF, PSNR, and SSIM (total average scores from Table 2 are the basis for Figure 1 above). The orange background indicates the lowest score of the three, while the green indicates the highest score. With VMAF, the FFmpeg Filters approach was the clear winner. With PSNR and SSIM, the Baseline files (original source, no filters) performed the best, with BitSave the winner over FFmpeg Filters in PSNR and the loser by one in SSIM tests. Again, this is BitSave at 60% the bitrate of the other two files.
Table 2. Comparing the Baseline and FFmpeg files to BitSave at 60% the bitrate of the other two files.
To put this in perspective, Table 3 shows some preliminary tests that I ran on a separate set of 5-second files using the same encoding string but at a single data rate, 1.75 Mbps per second for all technologies (total average scores from Table 3 are the basis for Figure 2 above). Here, BitSave clearly bested my filter-based FFmpeg approach with all three metrics.
Table 3. Comparing the Baseline and FFmpeg files to BitSave at the same data rate.
Subjective quality is always tough, particularly since I had "skin in the game" with my filter-based approach. In my view, BitSave didn't deliver the promised 40% bandwidth savings for H.264 encoding, at least as compared to the FFmpeg Filter approach and possibly even the baseline videos.
You can download three PDF documents with screen grabs to help you draw your own conclusions. The first is entitled BitSave1_Contrast and Overview and is available here. This presentation includes the clips that showed the most contrast differential, some with circled regions to direct your focus. The best approach is to open the PDF in full screen with the lights down low and and page back and forth between the images, which show the FFmpeg Filtered first, then the Baseline, then BitSave to make it simple to compare the Baseline to both approaches. All images except the final Football clip show BitSave at 60% the data rate of the other two.
Digging into the detail of the comparisons with BitSave at 60% showed some dramatic differences, including the Grand Theft Auto frame in Figure 4. We asked BitSave about game-related results, and they stated, "We are the first to acknowledge that we need to improve our models further for some use cases like gaming, and we are actually planning for a new release more or less in time with your article."
You can download a PDF with some other frame grabs from this set of clips here. This presentation shows the clips with a frame plot of the VMAF scores from the Moscow State University Video Quality Measurement Tool followed by the Source Frame, then Baseline, then FFmpeg Filters, then BitSave.
Figure 4. Comparing the FFmpeg filtered approach with BitSave
At the same data rate, BitSave was clearly better than my filter-based approach and you can download a similar PDF here.
The bottom line is that VMAF's hackability complicates evaluating pre-processing technologies. As much as we like to trust the simplicity of an objective metric, and the nice graphs and BD-Rate statistics it can generate, VMAF simply is not an accurate measure of pre-processing technologies.
In addition, my physician wife likes to say that correlation isn't causation. The analog here is that just because we discovered that VMAF was hackable while reviewing BitSave doesn't mean that BitSave is hacking VMAF. After many hours of testing, I found that BitSave's technology is valid and valuable, though the proof of the pudding will be how it performs in subjective testing with your test clips.
Finally, contrast and clarity makes a huge difference in the subjective appearance of our encoded files, and I was surprised how many of my test files were improved with both pre-processing approaches. Anyone who has ever adjusted the color, brightness, or contrast of a clip in Adobe Premiere Pro or other editor knows that most clips that haven't been professionally colorized can be substantially improved in just a few moments. If you're in charge of editing, it's definitely worth it to take the extra time. If you're the compression person charged with encoding what production sends you, share this word with them, or experiment with minimal adjustments in your encoder (if available). Your viewers will thank you.
Any good compressionist needs tools to help reveal critical data about encoded files, from codecs and resolution to GOP structure and quality levels. Here's a look at some of the most useful.