Review: NETINT Quadra T1U Video Processing Unit
This review will highlight the NETINT Quadra T1U and explore its capabilities as a video processing unit (VPU) for high-volume encoding and transcoding of single files, encoding ladders, and live streams.
The Quadra T1U VPU uses the Codensity G5 ASIC (application-specific integrated circuit) chip (Figure 1) and comes in a wallet-sized U.2 form factor. NETINT refers to the product as a VPU because in addition to transcoding functions, it performs scaling and overlay onboard and has AI rendering capabilities, which I didn’t test for this review. The cost is around $1,500, and 10–20 units can fit in a single server with the necessary U.2 slots. The U.2 slot uses the same ultra-highspeed PCIe connector as graphics cards. Each Quadra T1U draws only 17 watts of power and delivers more throughput than a computer that draws 400-plus watts.
Figure 1. The Codensity G5 ASIC chip
The Quadra T1U offers the following:
- AV1/H.264/HEVC/YUV encoding
- VP9/H.264/HEVC/YUV decoding
- Onboard scaling
- Onboard overlay
- Two AI deep neural network engines
As mentioned, the key component of the Quadra T1U is the ASIC chip. Transcoders with ASIC chips hold significant advantages over CPU-based and GPU-based encoders since they can be designed for a specific purpose—in this case, transcoding. Some other key benefits of ASIC chips are that they allow for smaller devices, perform specialized tasks, and improve efficiency by reducing power consumption.
Quadra T1U Setup
The Quadra T1U hardware setup is straightforward. In addition to the U.2 form factor, the Quadra T1U comes in PCIe form factor, similar to a network card or GPU that the user can install. Individuals with previous experience working on computers should be able to set up this device.
For software install, NETINT works with FFmpeg and GStreamer and has an SDK with an API. The Quadra T1U ships with scripts that automate the software installation process for users. You can read a post about installing the product’s hardware and software in this LinkedIn post.
For testing in this review, NETINT installed the Quadra T1U on a remote server and configured it for me. The company also provided scripts for my testing. So, all I had to do was connect through the Bitvise SSH Client to run my tests.
Using the Quadra T1U
As mentioned, when running the Quadra T1U, you can use FFmpeg or GStreamer scripts or run it directly via the API. I tested by connecting to the remote computer and running scripts in the terminal. You’ll also need a tool for generating reports for video quality metrics if you want to generate VMAF, SSIM, and PSNR scores.
The reviewer’s guide I used was written for Windows computers, and the open source programs for connecting and measuring quality were all Windows-based. For this reason, I tested using a Windows computer.
The Bitvise SSH Client and FFMetrics tools recommended by NETINT for my testing both run on Windows. The Bitvise SSH Client is used to connect to the server running the Quadra T1U and perform various tasks. What’s helpful about using this tool is that it easily allows users to connect to the server and open multiple terminal windows to run commands. You need multiple terminal windows for functions like reviewing encoding status and CPU usage. The Bitvise SSH Client is available for download at go2sm .com/bitvise.
FFMetrics is used to generate VMAF, SSIM, and PSNR scores. For reviewing and testing the Quadra T1U, I used Windows Server 2019 on an Amazon Web Services EC2 instance to connect to the Quadra T1U and run testing scripts. I installed the Bitvise SSH Client and FFMetrics on the Windows Server. The FFMetrics beta can be found here.
It’s important to note that there is no GUI for working with the Quadra T1U. You’ll need some BASH scripting experience using the terminal to run scripts once connected with the Bitvise SSH Client.
Using the Quadra T1U
Once the Quadra T1U is installed and set up, you can begin using the VPU to run encodes. Before you can start encoding, though, you’ll need to connect your Bitvise SSH Client to the server and run some basic commands using terminal windows to get the Quadra T1U ready for use.
First, start Bitvise or your SSH Client. Enter your IP information in the Host section along with your port number, as shown in Figure 2 (below). Next, add your username and password. Then, click the Log in button.
Figure 2. Logging into the Bitvise SSH client
Next, it’s time to run some commands to start using the Quadra T1U. Once logged in using Bitvise, you can open terminal windows to run commands or navigate to folders where you may want to run tests.
To open terminal windows, click the New terminal console button shown on the left in Figure 3 (below). You can open up to 10 terminal windows, which was a limit I never came close to reaching in my testing.
Figure 3. Opening Terminal Windows and SFTP Window
To navigate to directories on the Quadra T1U, click New SFTP window, and choose the directory. First, open a terminal window, and run the following command to initialize the Quadra T1U:
Second, open another terminal window to open throughput testing, then run this command:
The command runs the monitoring utility and refreshes every 5 seconds. It monitors decoder/encoder/scaler utilization.
Next, open another terminal window. In this window, you’ll track how many versions of FFmpeg are running simultaneously, and you’ll also monitor the overall system load. Run this command:
The terminal window will show tracking versions of FFmpeg running and monitoring overall system load when encoding is not taking place. Overall system utilization is low during Quadra T1U operation and quite high when encoding using CPU-only codecs like x265 and x264.
Finally, open a fourth window to run scripts. To run a single script, navigate to the folder where your script is, and run chmod +x on the script. It will appear as follows in the terminal window:
chmod +x scriptname.sh
Testing the Quadra T1U
The specs of the Ubuntu server used during testing for this review are as follows:
- AMD Ryzen 5 5600X 6-core CPU
- AMD Ryzen 5 5600X 6-core CPU running at 2200 MHz
- Two threads per core
- 12 CPU threads
- 16GB RAM
This server has six CPUs and 12 cores, so the total available system CPU is 1,200%.
For this review, I was interested in learning whether the Quadra T1U could benefit colleges and universities like mine, The Ohio State University. Based on our video needs, university-wide, there are thousands of encodes for on-demand videos weekly. Many of these encodes use encoding ladders. There are not as many weekly live streams.
These are the questions I hoped to answer in this review:
- Could encoding with the Quadra T1U provide a significant reduction in CPU usage for single-file encodes and encoding with encoding ladders?
- Could significantly more encodes be performed using the Quadra T1U compared to CPU-based encoding?
- Would the quality of encoding using the Quadra T1U be the same as FFmpeg encodes or better?
For my testing, NETINT provided guidance and instructions for best approaches to testing the Quadra T1U in ways that video engineers could benefit from and for how its customers use the product.
Here’s what I tested:
Throughput: single-file encoding
- H.264, HEVC, AV1 (using Quadra T1U)
- x264/x265 (using FFmpeg)
Throughput: ladder encoding
- H.264, HEVC, AV1 (using Quadra T1U)
- x264/x265 (using FFmpeg)
- Throughput optimized (Quadra T1U and FFmpeg comparisons)
- Quality optimized (Quadra T1U and FFmpeg comparisons)
Throughput Testing: Single-File
First, I’ll discuss single-file throughput testing. On the Quadra T1U, I ran a master script to perform 32 simultaneous FFmpeg encodes with one of the selected Quadra hardware codecs. Each encode input a 1080p file from a RAM drive to simulate live operation. The details about the “Football” source are shown in Figure 4 (below). This source was used in testing throughout this review.
Figure 4. Details for source for this review as seen in Mediainfo
To run each test, I navigated the server and selected and ran a test command similar to this:
The 32 in the name shows the number of simultaneous encodes achieved by calling and running 32 separate encoding scripts. When transcoding a single file to a single output with all three codecs, Quadra produced 32 30 fps simultaneous transcodes that I verified in the encoding logs. In contrast, encoding with FFmpeg and the CPU only, the server produced only five FFmpeg x264 encodes and three FFmpeg x265 encodes.
Figure 5 (below) shows what the script looks like for single-file throughput testing using the Quadra T1U.
Figure 5. Command string for Quadra with explanations
Figure 6 (below) shows the command string for x264 FFmpeg encodes.
Figure 6. Command string for x264 with explanations
And finally, Figure 7 (below) shows the command string for x265 encodes.
Figure 7. Command string for x265 with explanations
Once you submit the script to Quadra for encoding, you’ll see the decoders, encoders, scalers, and number of FFmpeg instances. ModelLoad 100 means you’ve maxed out on encoding. That’s why each codec encode peaked at 32, the Quadra T1U’s maximum capacity.
The Ubuntu top utility shows CPU utilization and the number of FFmpeg encodes running in this terminal window during a Quadra T1U encoding. CPU usage is extremely low when performing encodes on the Quadra T1U, but significantly higher when running FFmpeg CPU encodes. This occurred consistently during my tests.
In my tests, I was able to maintain only five simultaneous FFmpeg x264 encodes. Since the CPU max is 1,200% and my testing showed 963% CPU usage, another successful encode is not impossible. But the system tried six encodes and could not maintain the frame rate of 30 fps.
Figure 8 (below) shows a summary of my results for single-file throughput testing. The Expected Value column shows the number of encodes included in each master script, and the Actual Value column shows what happened during my tests.
Figure 8. Results for Throughput Testing – Single File
Controlling the Quadra T1U with FFmpeg, I had 32 simultaneous encodes for each Quadra hardware codec using one Quadra T1U module. For CPU-based encodes with FFmpeg, the max encodes were five for x264 and three for x265. This really illustrates the advantages that ASIC-based encoding has over CPU-based encoding and the potential cost savings per stream.
What's the best solution for your video on-demand encoding needs? That depends, but this guide will help you figure out which questions to ask.
Streaming Media's Jan Ozer and NETINT's Ray Adensamer discuss NETINT's Codensity T400, which is aimed at companies that need to do large live video encoding jobs at scale.
Companies and Suppliers Mentioned