Good News: AV1 Encoding Times Drop to Near-Reasonable Levels
That said, in my book, Video Encoding by the Numbers, I created similar curves for x264, x265, and LibVPx using eight clips averaging about two-minutes in duration. Before I started serious encoding with a new codec or encoder, particularly AV1, I would run tests on similar or larger numbers of samples.
Running Multiple Threads
During the recent project, I asked Google if there were any other ways to accelerate encoding speed. One engineer advised:
If you can use multiple threads while running the encoder, it would help with the encoder speed. For HD and above resolutions, we suggest using tiles. Using tiles causes quality loss, and my old testing showed ~0.6% loss while using 2 tiles and ~1.3% loss while using 4 tiles.
I didn't test 4k clips myself, so here I just give some suggestions.
For 1080p, use 2 tiles and 8 threads: "--tile-columns=1 --tile-rows=0 --threads=8
For 4k, use 4 tiles and 16 threads: "--tile-columns=1 --tile-rows=1 --threads=16" (or even try: 8 tiles/32 threads: "--tile-columns=2 --tile-rows=1 --threads=32")"
Before implementing tiles and threads in that project, I tested both a 1080p and 4K file, this time on my HP Z840 40-core workstation which had many threads to spare. I used the suggested settings for 1080p, and the second set of settings (--tile-columns=2 --tile-rows=1 --threads=32) for 4K. Table 5 shows the results. At 1080p, encoding time dropped by 41.66%, while for 4K it dropped by 70.56%, in both cases with negligible quality differences.
Table 5. Deploying multiple threads in other test encodes
Applied to our test clip on the ZBook testbed, deploying the --tile-columns=1 --tile-rows=0 --threads=8 switches dropped encoding time at cpu-used 5 from 20:06 to 12:16, the number shown in Table 2. This was accompanied by a whopping quality drop of .01 VMAF points (95.56 to 95.55).
Actually, to be perfectly clear, the switches added to our FFmpeg command string were as follows:
-tile-columns 1 -tile-rows 0 -threads 8
The switches shown by the Google engineer were likely for the AOM encoder that works independently of FFmpeg. Note that these switches are not currently in the FFmpeg help file for the AV1 codec but give them a shot and see if you get the same result (note: these switches were not documented in a previous version of FFmpeg that I checked while researching this article, but tiles, tile-columns, tile-rows, and row-mt are documented in the current version of the AV1 help file in FFmpeg).
Not to get all wonky on you, but while these settings should increase the encoding speed of any particular encoding run, they may not increase encoding throughput on any given system. That’s because they don’t appear to increase encoding efficiency, per se, they seem to allow each individual encode to consume more CPU resources, which is a zero-sum number on any given system.
Though the numbers don’t map perfectly, in essence, rather than processing two simultaneous encodings on the same system that each produce five minutes of encoded footage in an hour, we’re processing a single encode that runs twice as fast and produces ten minutes of encoded footage per hour. Overall system throughput is ten minutes per hour in both cases but the multi-threaded encode is working twice as fast. If you’re creating an encoder that processes multiple encodes in parallel, you may not want to use these settings. If you’re running a single instance of FFmpeg on a system, you almost certainly do.
Where Are We?
So, I started off saying that I wasn’t comparing apples to apples, which readers will now understand, but also that I wasn’t being fair to the other codecs. How’s that? Well, x264, x265, and LibVPx have their own quality/speed curves and if we’re applying the “practical” setting for AV1 we should do the same for these three codecs.
Specifically, if we use speed 2 for LibVPx (rather than the top-quality speed 0) and the slow preset for x264 and x265 (rather than very slow), we get the timings shown in Table 6. This puts AV1 at somewhere close to 20 times more expensive to produce than both x265 and LibVPx, which makes it appropriate only when encoding for high six and seven figure audience numbers. This is fine since to date it’s typically companies (and Alliance for Open Media members) like Netflix, Facebook, and YouTube that have produced video with the new codec. Impressive speed gains to date; I’m sure there’s more to come.
I’m showing the VMAF scores in Table 6 for informational purposes only; a single five-second 1080p encode of a relatively easy-to-encode clip at 3 Mbps is insufficient to draw any quality-related conclusions. Rather, you need to review rate distortion curves and BD-Rate comparisons from multiple clips. I’ll update the results from the AV1 review in the next few weeks to create relevant comparative data.
Table 6. Speed comparisons using the most “practical” settings.
In the meantime, if you’re encoding AV1, try the different cpu-used settings and tiles and threads, and see if your results are similar. If you read any AV1 comparative reviews that reference glacial encoding times, check and see which cpu-used setting the researcher used. If it’s not specified, the default is 1, which is likely a setting that no real producer would ever use. If it’s cpu-used 0, while arguably appropriate for academic research, the encoding times bear absolutely no relation to how real producers will actually use the codec.
To help those who want to try these new switches, here’s the final version of the FFmpeg command string.
ffmpeg -y -i input.mp4 -c:v libaom-av1 -strict -2 -b:v 3000K -g 48 -keyint_min 48 -sc_threshold 0 -tile-columns 1 -tile-rows 0 -threads 8 -cpu-used 8 -pass 1 -f matroska NUL & \
ffmpeg -i input.mp4 -c:v libaom-av1 -strict -2 -b:v 3000K -maxrate 6000K -bufsize 3000k -g 48 -keyint_min 48 -sc_threshold 0 -tile-columns 1 -tile-rows 0 -threads 8 -cpu-used 5 -pass 2 output.mkv
The annual Global Media Format Report is out from Encoding.com, and it's essential reading for streaming produceres looking for hard data on the state of video formats, codecs, and DRM.
Capable of real-time 4K/60p 10-bit encoding when running on Intel Xeon Scalable processors and Intel Xeon D processors, the the SVT-AV1 codec represents an order of magnitude acceleration of AV1 encoding.
As files get larger, Encoding.com does its best to ensure encoding times stay small. Ludicrous HLS processes HD and UHD movies in minutes.
The licenses cover devices such as smartphones, computers, TVs, set-top boxes, and graphics cards, but not encoded content—at least for now.
Ahead of Streaming Media West, a meeting of codec experts offers new developments in leading-edge codecs, as well as field reports from companies already using them.
Is AV1 all that people expect it to be? How much better would HEVC be doing with a fair royalty policy? Look to these charts for the answers to tomorrow's codec questions.
AV1 delivers equivalent quality to HEVC, but with a lower data rate. For now, though, it's slow. A five-second clip took 23 hours and 46 minutes to encode.
FFmpeg 4.0 gives many video engineers their first chance to test the new AV1 codec against H.264, HEVC, and VP9. The results? In our tests, quality was impressive, but glacially slow encoding times make AV1 a non-starter for most publishers until hardware acceleration becomes available.