How to Encode with LCEVC
Low Complexity Enhancement Video Coding (LCEVC) is one of the three MPEG codecs that will be released in 2020. Of the three, LCEVC is the only one that can have an immediate impact on the streaming landscape, because it can be implemented completely in software without degrading system performance or device battery life. In fact, LCEVC has already been implemented by many streaming producers as V-Nova's P+ codec. In this tutorial, I'll detail how to encode LCEVC using FFmpeg 4.0.1-17 and the version 2.9 build 539246 of V-Nova's P+ encoder. I'll also take a quick look at encoding speed and comparative quality.
LCEVC is a hybrid codec that incorporates two layers and two technologies into each encoded stream (see Figure 1.) The base layer is a low-resolution encode from any existing codec, such as H.264, HEVC, VP9, or AV1, which provides backward compatibility on platforms without an LCEVC decoder. The enhancement layer, which provides additional detail and resolution, is encoded using the LCEVC codec.
Figure 1. LCEVC compressed files have two streams: a base layer and an enhancement layer.
As an example, Figure 2 shows the VLC player playing a 1080p LCEVC encoded file that used a 960x540 H.264 encode for the base layer. At the time I wrote this tutorial, in January 2020, the VLC player wasn't LCEVC-compatible, so it detected and played the base layer stream, as you can see in the Codec information window on the right, and simply ignored the LCEVC data. This backward compatibility is one of LCEVC's greatest strengths.
Figure 2.The VLC player wasn't LCEVC-compatible when I wrote this, so it saw only the 960x540 H.264 base layer in this 1080p LCEVC encoded file.
With this as background, it's simple to understand what's required to encode LCEVC. First, you have to choose the codec for the base layer and configure that encode. Then you have to configure the enhancement layer. There are two use cases for LCEVC—video on demand (VOD) and live—and each requires a different type of encode.
Briefly, for VOD, V-Nova recommends using constant rate factor (CRF) mode, which provides a measure of per-title encoding. Since CRF is not the default setting, you'll have to configure both the base and enhancement layers. For live, V-Nova recommends constant bitrate (CBR) encoding, which is the default LCEVC mode. As you'll see, if you don't need to implement any non-default settings in the enhancement layer, the CBR configuration is much simpler. I'll detail each in turn.
Encoding for VOD
The FFmpeg encoder I tested could input MXF, YUV/Y4M, MP4, and ProRes files and output transport streams (.ts) and .mp4 files. It supported 8-bit, 4:2:0 encoding with 10-bit, 4:2:2 encoding on the short-term road map. The FFmpeg executable is available for Windows and Linux; I tested the Windows version.
As mentioned, to produce LCEVC, you have to identify and configure the base layer codec and configure the enhancement layer. All of my tests used the x264 codec as the base layer, and you use traditional x264/FFmpeg controls to configure this layer.
As is typical for FFmpeg, enhancement layer configurations came after a designated switch, in this case
-eil_params, with individual options separated by a semicolon and all parameters encased in quotes. For the record, "eil" stands for "encoding integration layer." As with all FFmpeg commands, if you want to apply the default value, you can exclude that configuration option; FFmpeg will use the default value automatically.
Here is a simple command line containing both base layer and enhancement layer configurations:
ffmpeg -i Meridian_30.mp4 -c:v pplusenc_x264 -g 120 -keyint_min 120 -sc_threshold 0 -eil_params "rc_pcrf=23;bitrate=;rc_pcrf_gop_length=120" Meridian_pplus.mp4
Batch 1. A simple command line for LCEVC showing the -eil_params settings
If you're familiar with FFmpeg, this is simple stuff.
ffmpeg calls FFmpeg.
-i Meridian.mp4 names the input file.
-c:v pplusenc_x264 identifies P+ (V-Nova's implementation of LCEVC) as the video codec and x264 as the base encoder. This syntax will obviously change when other codecs are used for the base layer.
-g 120 -keyint_min 120 -sc_threshold 0 configures GOP configuration in the base layer.
-eil_params "rc_pcrf=23;bitrate=;rc_pcrf_gop_length=120" configures the enhancement layer; I cover the various options specified. As you'll learn, I didn't have to include the GOP length control since the default GOP size is 2 seconds, or 120 frames for this 60 fps test file. I included it to show the presentation of multiple commands in the -eil_params string.
Meridian_pplus.mp4 names the output file.
Note that when you use the P+ encoder with the x264 codec, the default preset is
medium, which normally is the default preset for x264. This isn't important for production, but if you're comparing P+ encoding time and/or quality to native x264 encodes, make sure you choose the same preset for both. If you leave the preset unspecified in the command strings, which I normally do to default to medium, you'll be comparing apples and oranges, or more specifically LCEVC using the
veryslow preset to x264 using the
With this as background, let's explore the enhancement layer options and then circle back to the base layer. I won't cover all the options, only those that you're most likely to use or consider using.
As previously mentioned, LCEVC has two existing bitrate control techniques, CRF and CBR, with the former recommended for VOD and the latter for live. I'll cover CBR in the live section to follow.
CBR control is the default option. To deploy CRF encoding, you have to designate so in the
-eil_params string and set the CRF value. As with the other CRF implementations that I've seen, the available values range from 1 to 51, with lower scores producing higher-quality results. In this case, the selection controls the overall quality of the base and enhancement layers. Typical values are in the 20–36 range.
When specifying CRF mode, you also have to insert an unspecified bitrate command. Accordingly, as shown in Batch 1, to set the CRF value at 23, you would add the following switch to the encoding integration layer section after
According to V-Nova, the P+ implementation of LCEVC has a capped CRF mode, but it wasn't available for testing when I wrote this. With x264 and other traditional codecs, uncapped CRF is a concern for all longer-form content producers because of the potential for data spikes. I asked about this, and my contact responded that "we typically observe that spikes for LCEVC are lower than for native because of the multi-layered approach, so uncapped CRF may become acceptable."
The scaling mode determines the resolution of the base layer as it relates to the final output resolution. As shown in Table 1, 2D, which is the default, dictates a 2:1 scaling, producing a base layer of 960x540 for 1080p output and a 640x360 base layer for 720p output.
Table 1. Setting the scaling mode and size of the base layer
1D scaling is horizontal only, producing a base layer of 960x1080 for 1080p video and 640x720 for 720p video. This option would be worth exploring when trying to optimize encoding parameters for lower rungs on the encoding ladder. The final option, 0D, or no scaling, is a testing mode that would never be used in production, which is why I don't document Native mode mentioned in Table 1.
Additional scaling modes (3D and 4D) are specified in the LCEVC standard to further scale the base layer. These were not available yet in the version of FFmpeg I tested and are on the road map, according to V-Nova.
-eil_params string, you specify the scaling mode after the = sign. So to select 1D mode, you would add
scaling_mode_level0=1D to the other commands within the quotes.
Enhancement Layer GOP Size
This switch sets the GOP size of the enhancement layer in frames and must equal the GOP size of the base layer. You see the following switch in Batch 1 setting the GOP size to 120 frames, or 2 seconds, for this 60 fps test file:
As mentioned, the default value for this setting is 2 seconds, or, more specifically, 2x the frame rate, so if your GOP size is 2 seconds, you don't have to include this switch.
Dithering is intentionally applied noise injected into the stream to minimize visual impairments like color banding or blocking artifacts that may appear when encoding at low bitrates. Dithering is disabled by default and enabled with this switch:
Once you enable dithering, you can also set dithering strength. According to the V-Nova documentation, "the default value is 5, a value of 7-8 shows a more visible dither, while a value of 2-3 should be used for an almost unperceivable dither." You set the strength with this switch, which I set to 4, a value that V-Nova included in one of its recommended command strings:
I experimented with dithering using Harmonic's football test clip, which contains significant high motion and lots of details like uniform numbers and grass, testing the 60 fps clip with and without dithering at both 2Mbps and 4Mbps. At both data rates, dithering didn't reduce artifacts as much as create a different kind of artifact, making the dither versus non-dither decision highly subjective.
In Figure 3, dithering on the left appears to clarify the numbers on several of the offensive linemen. However, the perceived benefits of dithering changed from frame to frame, and I would definitely test multiple clips in front of multiple eyeballs before making the decision to dither. I did not use dithering in any of my comparisons.
Figure 3. Dithering appears to create slightly clearer numbers in this frame from the 2Mbps encode.
If you're considering dithering, note that according to V-Nova's documentation, dithering is applied adaptively:
Dithering is applied dynamically and content-adaptively by the encoder, depending on the quality of the base layer (base qp). Irrespective of the specified strength, it will automatically disappear in static/very easy scenes and its intensity will be automatically modulated on a frame-by-frame basis according to the base qp, starting above a certain threshold (
-dc_dithering_qp_start) and maxing out above a second threshold (
I didn't experiment with either of the two designated switches.
Operationally, dithering is computed during encoding but applied during decoding. If you're decoding for metric calculation, V-Nova directs you to disable dithering in the decoder via this switch:
-disable dithering 1
If you decide to dither, don't expect disabling dithering to make a huge difference in metric scores, however. In the 2Mbps encodes, the Video Multimethod Assessment Fusion (VMAF) score for the dithered clip was 79.38 as compared to 79.58 for the non-dithered clip. At 4Mbps, the difference was even smaller, with the dithered clip at 90.36 and the non-dithered clip at 90.51. If you assume that VMAF accurately measures how humans will subjectively rate the videos, these numbers truly indicate how challenging the dither versus non-dither decision will be.
Resolutions with Height not Divisible by 8 (e.g., 540P)
If your output resolution isn't divisible by 8, you should insert this switch into the command string:
OK, this covers the most frequently used switches for VOD encoding; let's switch over to live.
Encoding for Live
Most live producers use CBR control. With LCEVC, you can choose CBR for the total stream, but there are two bitrate control methods for the base stream, CBR or CRF, with the latter the default. Using the default setting, the CRF value is calculated automatically by the LCEVC codec, which is essential to the optimal allocation of data rate between the base and enhancement layers. You should only use CBR for the base layer if you have specific operating constraints for the H.264 stream.
To choose CBR control over the base layer, add the following to the
-eil_params command string:
When you do deploy CBR for the base layer, you don't set that CBR data rate. Rather, it's set automatically via LCEVC's internal rate-control mechanism.
Note that you don't set the data rate in the
-eil_params string, as the encoder will take this from bitrate set with the base layer controls. If you use a 2-second keyframe for your base layer, which is the default for LCEVC, you don't need to specify that either, making your final command string free from any
-eil_params. Here's the command string that V-Nova used on a series of CBR testing that I reviewed:
-i input.mp4 -c:v pplusenc_x264 -aq-mode 1 -aq-strength 0.8 -deblock -2:-2 -threads 1 -g 120 -keyint_min 120 -sc-threshold 0 -b:v 4M ouput.mp4
Batch 2. The CBR command string V-Nova used for large scale testing
Nice and simple, but there are some base-level codec optimizations that we haven't yet discussed. We'll finish our configuration-related discussions there.
Recommended X264 Parameters
When encoding for optimal visual quality, V-Nova recommends the following settings:
Briefly, AQ is adaptive quantization, which allows the codec to reallocate bits around the frame to optimize visual quality. AQ is a psychovisual configuration that improves visual quality but can degrade the scores of older objective quality metrics that are often based on how the compressed frame differs from the original. More on that in a moment.
In default mode (e.g., no
aq-mode setting in the command string), x264 uses mode 1, which, according to the documentation found at bit.ly/2RfRf71, enables the codec to "redistribute bits within each frame." Mode 3 is "auto-variance AQ with bias to dark scenes."
V-Nova also recommends adjusting
aq-strength to 1.3 from the default of 1, which, according to the same documentation, "sets the strength of AQ bias towards low detail ('flat') macroblocks." Finally, V-Nova recommends adjusting the default deblock setting from 0:0 to -2:-2, which adjusts the H.264 loop filter.
These are very obscure controls that most encoding professionals happily ignore, but given LCEVC's unique hybrid structure, I assume they do improve visual quality. However, that's an assumption I would confirm before actually implementing LCEVC.
Again, the settings previously mentioned are for maximum visual quality. When encoding for measuring quality via objective quality metrics, V-Nova recommends these settings (with no change to
You see these in the command shown in Batch 2.
So if you're producing for visual comparisons, you should use the top set of settings, but for metrics, the bottom set. Obviously, if you're using objective metrics to compare LCEVC to a different codec, like x264 or x265, you should use that codec's internal tuning mechanisms to disable any psychovisual adjustments. For example, if you tune for peak signal-to-noise ratio (PSNR) with the x264 codec (
-tune psnr), you disable AQ and all other visual optimizations. This should slightly degrade visual appearance but will improve metric scores, as will the second set of options recommended by V-Nova.
What I Learned in Testing
I created two command strings for testing: one for CRF for VOD testing, the other for CBR for live. I tested only performance with the CRF settings and both performance and quality with the CBR.
Here's the CRF command string that I used for performance testing, encoding a 30-second segment of the 60 fps Netflix Meridian (shown) and football test clips:
ffmpeg -i Meridian_30.mp4 -c:v pplusenc_x264 -aq-mode 3 -aq-strength 1.3 -deblock -2:-2 -threads 1 -preset veryslow -g 120 -keyint_min 120 -sc_threshold 0 -eil_params "rc_pcrf=23; bitrate=" Meridian_pplus_CRF.mp4
Batch 3. The Meridian CRF LCEVC encoding string for VOD
Here's the x264 command string used for comparison purposes:
ffmpeg -i Meridian_30.mp4 -c:v libx264 -preset veryslow -crf 23 -g 120 -threads 1 -keyint_min 120 -sc_threshold 0 Meridian_x264_CRF.mp4
Batch 4. The equivalent string for x264 CRF
Table 2 shows the results. To be clear, LCEVC is encoding the base H.264 layer at 960x540 and the enhancement layer to 1080p, while x264 is encoding the complete stream to 1080p. Overall, LCEVC shaved about 60% off x264's encoding time.
Table 2. Encoding time in seconds. LCEVC cuts about 60% of the encoding time over native H.264 at the same resolution.
threads 1 for both encodes, which limits FFmpeg to one core for each encode. I did this because I noticed some severe transient quality drops in the x264 clips when encoding with 8 threads (for more on this subject, see my article, "FFmpeg Threads Command: How It Affects Quality and Performance," at bit.ly/ffmpeg_threads). When using 8 threads, LCEVC was still about 50% faster than x264. I performed all tests on my HP ZBook notebook powered by a 2.8 GHz Intel Xeon Processor E3-1505M v5 CPU and running Windows 10 Pro with 32GB of RAM.
Here's the CBR command string that I used for performance testing:
ffmpeg -i meridian_30.mp4 -c:v pplusenc_x264 -preset medium -aq-mode 3 -aq-strength 1.3 -deblock -2:-2 -threads 1 -g 120 -keyint_min 120 -sc-threshold 0 -b:v 4M meridian_pplus_CBR.mp4
Batch 5. The Meridian CBR LCEVC encoding string for live
Here's the x264 command string:
ffmpeg -i meridian_30.mp4 -c:v libx264 -preset medium -g 120 -keyint_min 120 -sc_threshold 0 -threads 1 -b:v 4000k -maxrate 4000k -bufsize 8000k meridian_x264_CBR.mp4
Batch 6. The equivalent string for x264 CBR
I tested using the
medium preset rather than
veryslow to simulate live operation. As previously mentioned, I included the
-preset medium switch in both command strings because although
medium is the default preset for x264 (and isn't needed in that string),
-veryslow is the default for LCEVC, so
medium does need to be in that string.
You can see the results of the performance trials in Table 3, again showing encodes using a single thread. In this case, when I encoded with 8 threads, LCEVC was still faster than x264, but only by about 25%, and x264 exhibited the aforementioned transient quality issues in nearly every video.
Table 3. LCEVC cuts about 52% of the encoding time over native H.264 at the same resolution.
I compared the quality of LCEVC with an H.264 base with full-resolution H.264 using metrics and visually, working with the two files mentioned herein, Netflix's Meridian clip and Harmonic's football clip. For the metrics, I created four encodes of each technology at 1-4Mbps, using the settings V-Nova recommended for metric comparisons (
-aq-mode=1, -aq-strength=0.8, -deblock=-2;-2) and tuning the x264 clips for PSNR (
-tune psnr). Clearly, two samples are inadequate to draw any firm conclusions, but quality assessment wasn't the purpose of this article, and having done all the other work, I wanted to quickly gauge the results.
In terms of metrics, LCEVC handily won the VMAF trials, with a BD-Rate advantage of about 32% over H.264, meaning that LCEVC could produce the same quality as native H.264 at 68% the bitrate. This should allow LCEVC to meet its primary quality goal, which (roughly) is to hit HEVC quality using H.264 as the base layer at the same bitrate as HEVC. Figure 4 shows the rate distortion graph from the Meridian clip and the VMAF metric using the clips tuned for metric computations.
Figure 4. The rate distortion graph for LCEVC compared to H.264 using a 30-second segment of the Meridian clip
However, LCEVC with an H.264 base and native H.264 was about even in SSIM testing and trailed in PSNR. My findings were generally similar to results that V-Nova provided from tests on a much larger number of Netflix test clips. I'm being specific about VMAF and vague about PSNR because my subjective evaluations were much more consistent with VMAF and because I trust VMAF to accurately predict subjective ratings much more so than PSNR.
That said, as much as I respect VMAF, when comparing codecs, formal subjective comparisons are a must, and I think it's particularly true for LCEVC. Any early evaluation ofLCEVC that doesn't include significant subjective test input shouldn't be given a lot of weight. Any that rely solely or primarily on PSNR are simply misguided.
V-Nova CEO & Co-Founder Guido Meardi discusses current issues with encoding complexity and cost and how LCEVC--Low Complexity Enhancement Video Coding--addresses those issues in this clip from his presentation at Video Engineering Summit.
The pace of innovation is getting faster and the demands on video codecs are getting greater. MPEG's three-part plan answers questions of royalties, licensing, and computational efficiency. Meet VVC, MPEG-5 Part 1 (EVC), and MPEG-5 Part 2 (LCEVC).
Companies and Suppliers Mentioned