How to Encode with FFmpeg 5.0
On January 17, 2022, FFmpeg released FFmpeg 5.0, called Lorentz. To celebrate and to help introduce new users to the power and ease of FFmpeg, I created this entry-level tutorial for single and two-pass encoding with FFmpeg.
For the record, many of the major changes to FFmpeg were at the application programming interface level, so if you're driving the program from the command line, you'll see little difference. Thankfully, Lorenz didn't "break" any of the command strings that I tested from my FFmpeg book, Learn to Produce Video With FFmpeg: In Thirty Minutes or Less, or courses, so hopefully, it won't break existing command strings for other users. Most of the new additions fall into the advanced category and won't affect what's shown in this tutorial.
Let's start with a brief look at FFmpeg. Note that you can download the scripts shown in this article.
According to FFmpeg's About page, "FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created." FFmpeg runs on Windows, Mac OS X, Linux, and a wide variety of other build environments and is incredibly useful for a broad range of activities.
For example, I learned FFmpeg to support the experimentation that went into my book, Video Encoding by the Numbers: Eliminate the Guesswork From Your Streaming Video, which involved thousands of encodes to identify the optimal encoding parameters for H.264, HEVC, and VP9. Trying to perform this testing, as well as the associated quality measurements, with one or more applications would have been impossibly cumbersome.
Beyond encoding, FFmpeg also provides a useful range of functionality, like changing container formats without re-encoding, extracting file sections without re-encoding, and scaling files to different resolutions, all of which made my article, "Discover the Six FFmpeg Commands You Can't Live Without," one of the most popular articles on Streaming Media. Rather than focusing on random tasks, this tutorial will walk you through the fundamentals of encoding with FFmpeg, essentially retracing my learning journey for Encoding by the Numbers.
There are plenty of tutorials available for getting FFmpeg installed on the platform of your choice. Here are links to installing for Windows, Mac, and Linux.
All of these tutorials were written for previous versions of FFmpeg, but they should still work if you download FFmpeg 5.0. So, install FFmpeg on your operating system of choice, and let's get started.
About Command-Line Processing
When you first start using FFmpeg, you'll probably be working via the command line. You'll build a command-line "argument" or "script" that comprises different configuration options or "switches" that tell FFmpeg how to encode. You can feed these arguments to FFmpeg directly in the Command or Terminal window or via batch files, which are text-based files that contain one or more command-line arguments that you can run from the Command window or Windows Explorer.
When you're experimenting with your configuration options, you'll typically feed commands directly into the Command window. Once the configuration is set, you'll likely create a batch file to encode multiple files.
In most, but not all, instances, FFmpeg command-line arguments work the same for all operating systems. One notable exception is the command syntax for two-pass encoding, which differs for Windows and Mac/Linux. I'll cover that later.
However, there are multiple minor differences between Windows and Mac/Linux batch files that you'll need to learn. I'll cover the command strings in this article, but not how to create batch files for any operating system.
Figure 1. The most basic FFmpeg encoding script (click for larger image)
Getting Started with FFmpeg
Figure 1 shows the most basic FFmpeg encoding script, where you specify the input file, name the output file, and choose the video codec.
When you deploy this script, FFmpeg applies many default settings to the encode, as partially shown in Figure 2. I say "partially" because there are a lot more defaults applied than those shown, although Figure 2 contains the configuration options we'll consider in this tutorial.
Figure 2. The output (click for larger image)
When you don't specify a data rate, as I did not in Figure 1, FFmpeg employs the constant rate factor (CRF) encoding technique using a value of 23, which should deliver very good quality at the minimum possible bitrate. In the absence of other direction, FFmpeg sets the keyframe interval at 250, or about 10 seconds for this 24 fps file, and keeps the resolution and frame rate the same as the source. Unless you specify otherwise in the command string, FFmpeg encodes using the H.264's High H.264 profile and the Medium preset.
By way of background, presets control a number of encoding parameters and enable a producer to choose the desired balance of encoding speed and encoding quality. In Figure 2, you can see that the Medium preset uses three B-frames and three reference frames, but there are many more parameters not shown that are controlled by the preset. Check out go2sm.com/x264 to see all of the configuration options controlled with x264 presets.
If you change presets, you'll likely change some of these options. Or, you can override the preset-controlled value by specifying a different value for that configuration option in the command string. So, if you added
refs=1 to your command string, FFmpeg would use the Medium preset with one reference frame rather than three. I'm not suggesting that you need to adjust these settings. However, typically in my encodes, I accept most, if not all, of the preset values.
Going forward to audio, when you select the x264 codec, FFmpeg uses AAC audio compression, which is the default for x264 video encoding. FFmpeg maintains the same audio channels and samples as the source file and applies a bitrate of 128Kbps.
The key thing to remember is that these defaults simplify encoding. So, if you want to encode at the source resolution and frame rate, and you're OK with the High profile and Medium preset, you don't need to include switches for any of these configuration options in the command line. If the default audio parameters are OK as is, you don't have to include any audio switches in the command line, and most of my command-line arguments don't.
In most instances, you'll insert a switch into the command string only when you don't need to use the default. This is the case when you're encoding for adaptive bitrate (ABR) streaming because a keyframe (or I-frame) interval of 250 just won't work. To change this and to achieve the desired settings, you'll need to insert switches into every command string created for a file that's produced for ABR delivery.
Setting the I-Frame Interval in FFmpeg
The x264 codec uses three different frame types during encoding: I-frame, B-frame, and P-frame. I-frames are self-contained frames that must appear at the start of a file or file segment; otherwise, they might not play correctly. During ABR playback, the player retrieves multiple file segments and plays them back in sequence. So, one key requirement during encoding is to ensure that the first frame in every segment is an I-frame.
To accomplish this, you must have regular I-frames, and the I-frame interval must divide evenly into the segment size. So, if your segments are 6 seconds long, your I-frame interval must be either 1, 2, 3, or 6, with 2 being the most frequently used.
To insert I-frames at 2-second intervals, use the switches shown in Figure 3. The first switch,
-g, sets the I-frame interval to 48 frames, or 2 seconds for this 24 fps source file.
Figure 3. Choosing the I-frame interval (click for larger image)
However, by default, FFmpeg will insert I-frames at scene changes, which can interrupt the placement of I-frames every 48 frames. To prevent this, I set the
-keyint_min to 48 and the
-sc_threshold (scene change threshold) to 0.
Figure 4 shows the GOP view of the file I just encoded in Telestream's Switch, with I-frames in white, P-frames in purple, and B-frames in blue. You can see that the I-frame interval is 48 with no intermediate I-frames, so we've accomplished our goal.
Figure 4. Here’s the encoded file with an I-frame interval of 48 shown in Telestream’s Switch (click for larger image)
Note that FFmpeg almost always presents multiple ways to accomplish any task or goal. For example, there are command sequences that insert I-frames at scene changes and maintain an I-frame at the desired interval. I just never found the qualitative difference worth the trouble and believe that in most instances, simple is better.
Let's tackle a few switches at once as you might do for lower rungs on the encoding ladder. If, for example, you were targeting very old iPhones connecting via 3G, you might want to reduce the resolution to 270 and the frame rate to 12 fps (half the 24-second rate)and use the Baseline profile for playback compatibility. To do all of this, you would add the switches shown in Figure 5 to the command string shown on the bottom.
Figure 5. Creating lower rungs on the encoding ladder (click for larger image)
You can see in MediaInfo on the right that we've achieved the targeted parameters, although the data rate is a bit too high. It's time to tackle bitrate control and two-pass encoding.
FFmpeg Bitrate Control
So far, the bitrate for each file has been set by the default
CRF=23 switch used when there's no data-rate configuration in the command line. Let's fix that. Figure 6 shows the three switches that impact the video bitrate.
Figure 6. Setting the video bitrate (click for larger image)
You can see that
-b:v sets the video bitrate. If you don't care how the data is distributed in the file, you can just use this switch; you don't need the other two.
As the name suggests,
-maxrate sets the maximum video bitrate for the file. As shown in Figure 6, if you're attempting to produce a constant bitrate (CBR)-encoded file, you set the maximum at the same rate as the target. For 200% constrained variable bitrate (VBR) encoding, you set the maximum at twice the target.
-bufsize switch sets the size of the video buffering verifier (VBV), which you can read about at go2sm.com/vbv. As a general rule, larger VBV sizes deliver slightly higher quality, but also greater bitrate variability. If you care about controlling the bitrate, as you would for a CBR encode, you should keep the VBV buffer small, with 1 second a good rule of thumb. If bitrate variability isn't a concern, and it usually isn't for 200% constrained VBR, you can double that number.
Two-Pass Encoding in FFmpeg
VBR encoding works best when you use two passes: one to scan and gauge the encoding complexity of different sections of the file and the other to perform the actual encoding. Figure 7 shows the command sequence for implementing this and the difference in the first pass between Windows and Mac/Linux.
Figure 7. Implementing two-pass encoding in Windows (click for larger image)
You can see the explanations for the new switches in the slide. In the first line, you add a
-y to tell FFmpeg to overwrite any existing log files. Otherwise, if you performed a two-pass encode in that folder before and a log file exists, or if you're encoding multiple files in batch mode, FFmpeg will pause and ask if you want to overwrite the existing log file. If you've left the office thinking you had scheduled a productive night of encoding, you'll be awfully disappointed in the morning.
Then you add
-pass 1 to tell FFmpeg that it's the first pass,
-f mp4 to identify the output format you'll be producing in the second pass, and
NUL &&\ to tell FFmpeg to create the Null file, which contains the first pass information. The two ampersands and slash tell FFmpeg to run the second pass if the first pass is successful.
The second pass contains the
-pass 2 to tell FFmpeg to find the NUL file and use that information and the output file name.
Sharp-eyed readers will note that the first pass command string contains only the
-b:v control, while the second pass includes all three data-rate-related commands, thus begging the question, which commands need to go in each pass? This is a complicated issue beyond the scope of this tutorial. For now, assume that it's safe to deploy the data-rate parameters as shown, but for other configuration options, it's safest to include them in both passes.
FFmpeg Audio Parameters
If you need to specify the audio parameters in a file, say to change stereo to mono or to set the data rate below 128Kbps, you can use the controls shown in Figure 8. Use these to choose the codec, bitrate, channels, and sample rate.
Figure 8. Audio controls (click for larger image)
Going Forward With FFmpeg
Here are a couple of points about FFmpeg. First, the configuration options previously shown work well for most codecs, but each codec has different commands. Also, FFmpeg is very resilient, and if a configuration option is incorrect, it will typically finish the process even if the process doesn't accomplish what you want it to. If you check the Command window, however, you may notice errors in yellow or red.
This is what you see in Figure 9, where I changed the codec to VP9, which doesn't recognize the
-sc_threshold command. FFmpeg completed the first pass and would complete the second, but it may insert I-frames at scene changes unless you find the right command for VP9 (there is none). In this case, there's no harm done, but that may not always be the case. So pay attention to the Command or Terminal window while you're experimenting.
Figure 9. Watch the Command window for errors that may mean you’re not achieving your goals (click for larger image)
The second point is to check your results after each encode. Telestream's Switch, shown in Figure 4, is a fabulous tool for viewing the frame structure; Zond 265 is another.
Still another useful tool is MediaInfo, which can display the compression-related metadata that's stored in the FFmpeg-encoded file (see Figure 10. You can see this data in all MediaInfo views, although the HTML view is the best. This information lets you verify the parameters that you've applied—or tried to apply—and allows you to explore video files from other developers to check their encoding parameters.
Figure 10. Checking your encode in MediaInfo (click for larger image)
For example, in Figure 10, you can see that
keyint_min is set to 25 even though the command string requested 48. While this doesn't have any practical impact because the scene change threshold is set to 0, it shows that you always need to verify your encoding parameters before you take them into production.
The bottom line with FFmpeg (and all encoders, really) is that you can't assume your command string is correct simply because the encoder creates a file. You have to verify that your encoding parameters are as you expected.
[Editor's note: This article first appeared in the 2022 Streaming Media Industry Sourcebook.]
Jan Ozer will lead workshops, presentations, and panels covering advanced codecs, gear for remote productions, WebRTC, low latency, and reducing bandwidth and storage costs at Streaming Media East in Boston May 23-25.
Robert Reinhardt's workshops will cover the latest tips and tricks for using FFmpeg and managing inputs and outputs for videoconferencing, while presentations and panel discussions will look at taking your webcasting and event streaming efforts to the next level.
The good news: As always, Moscow State's codec studies are some of the most comprehensive available. The bad news: Unless you're TikTok or Tencent, you won't have access to some of the best performers.
Anyone who does performance or benchmark testing, please take a look: The six commands in this article help with essential tasks that crop up in any studio or encoding facility.
With just a few beginner-level scripts, you can encode and package multiple filds to HLS and DASH output using open source tools.
AV1 delivers equivalent quality to HEVC, but with a lower data rate. For now, though, it's slow. A five-second clip took 23 hours and 46 minutes to encode.
Companies and Suppliers Mentioned