Building a Streaming Workstation? Here's What You Need to Know
A lot goes into creating a professional and efficient streaming workstation. Choosing the right hardware to fit the task can be daunting for those not familiar with what goes on beneath the surface of a PC or Mac-based laptop or workstation. That's one of the reasons Telestream created Wirecast Gear, a professional, turnkey streaming system available to fit a range of budgets and performance requirements. But there will still be situations in which it makes sense to build a custom setup, so here's a look at key hardware decisions and other factors you'll want to consider if you go that route.
While live streaming production quality can vary considerably, shows with higher production value do a much better job of attracting and retaining viewers. The advent of powerful streaming software applications has made it possible for anyone to create broadcast-quality productions. Running a powerful, real-time software application demands a certain amount of horsepower from the computer it is being run on. While recommended system requirements are published to guide users wishing to "roll their own" workstation for the purpose, it helps to have some basic knowledge about the unique demands of real-time streaming.
If you have basic workflow requirements, you can, in most cases, buy any reasonably equipped modern machine and you'll probably be fine. If your workflow requirements are more professional—having four simultaneous instant replays available from recorded ISO files, three monitor outputs (one for a multi-viewer, one for the audio mixer, and one for the user interface plus program out)—that's where correctly configuring a capable machine can get really difficult for some users. What about source video playback? Do you need the ability to handle 4K? 3D titles? The more you want to do and the better you want to do it, the more sophisticated the machine has to be, and that takes knowledge about precisely what your computer hardware can do.
First Things First
Since we are discussing a video streaming workflow, it probably goes without saying that knowing the bitrate you want your output streams to be versus how much upload bandwidth you have available from your ISP is critical to your overall success. It's recommended that your upload speed be at least double your selected output video bitrate to allow for any additional headroom you might need. This is especially important for target bitrates of 10Mbs or less. Once that limiting factor is out of the way, then we can talk about the computer hardware necessary to run streaming software to its full potential. For a typical workflow, we need to not only successfully encode a live stream so that viewers can see it, we also need the ability to create layers, add titles, music, effects or other elements that give your show the high production value it deserves.
Doing all that in real time takes horsepower, so let's look at the four main areas of hardware components that will affect your streaming capacity and performance:
- CPU and RAM
- GPU (graphics and the potential for encoding streams)
- Storage (for both operating system and media)
CPU and RAM
How can an underpowered CPU (central processing unit) affect your production and streaming capabilities? How powerful a CPU do you need? It all depends on what you need to do. CPUs are typically differentiated by their clock speed and the number of processing cores they have. For most use cases, you want to aim for a middle-of-the-road CPU in terms of cores versus clock frequency. All the latest mid-range Intel CPUs support eight cores. An Intel i7 is generally recommended, as it will provide enough headroom for the majority of workflows. With eight core processors plus hyperthreading on each, you get 16 total threads and that's pretty close to where you want to be for an optimized single x264 encode workflow according to the x264 spec. Encoding using x264 directly impacts your CPU since that is where the work is done, but if you have a sufficiently powerful CPU, it should be more than capable of creating your stream while also running your streaming application, your computer's operating system, and other ancillary tasks. However, if you're also doing a lot of network-based ingest for your source videos, or a lot of MP4 decodes for playback, you'll really want to focus on your CPU, possibly getting the highest-end CPU you can, such as a high-clock rate, multi-core Intel Xeon.
Theoretically, having a higher clock speed is always going to result in a better encode. When playing back video files, things can vary drastically based on the codec and the format of the source. For example, you can play out a 720p ProRes file and it's going to impact the CPU far less than a more compressed version of the same material. ProRes is "data heavy," but not as "computationally expensive" because it's an I-frame codec, meaning there are no predictive frames that need to be reconstituted on the fly. The inverse of that is H.264 or other MP4 types of video which utilize predictive frames to increase compression. These frames must be rebuilt from predicted frames which requires CPU resources. If you have 5 or 6 simultaneous MP4s playing, depending on the resolution, they could easily take up 70% or more of the CPU.
Since everything flows into and out of a computer via different busses on the motherboard, it's understandable that an improperly configured motherboard, or one from a previous generation of technology, may not be able to deliver the speeds and density of data required to move video and audio between all its components in a live workflow. A motherboard's capabilities are primarily defined by the type of CPU it supports and its chipset. Without going too deep into the weeds, the chipset is the traffic cop that controls information flow along the various busses (front side, back side, memory) between all of the components connected to the motherboard. PCIe is a protocol that identifies your devices and negotiates the transfer of information. From a performance standpoint, the chipset directly determines the data throughput and the bus speed possible between components. If you are wondering whether that older PC in the corner has what it takes, take a look at its chipset. An older style chipset that predates the Z170, X99, and Z87 class of technology isn't going to support the bus speeds required to move data around for modern streaming production workflows.
A lot of people focus on the actual speed of the bus, but that's not the most important metric. It's much more important to understand what components are sharing the bus, through which lanes, and how your chipset's PCIe lanes are mapped out. Depending on how resources are shared, where you connect things like SATA drives and other storage subsystems can make or break the performance of the entire system. Being able to determine how much extra traffic is on your bus and whether or not you're going to have an increase in latency on one memory controller over another is critical when building a high-performance workstation for video. Any or all of these motherboard gotchas have the potential to cause delays in the frames arriving back and forth between your CPU, GPU and your RAM.
The GPU (graphics processing unit) can be critically important to a professional streaming system. If your first thought is that it's all about generating high-quality graphics and titles, that's only a small part of the story for professional streaming. That's not to say that titling can't get complicated, because it definitely can. For example, if you're playing back a rendered video for titles, which many titling engines support, it's no different than playing back a video from a system resources standpoint. However, if you're using something like an advanced titling application from NewBlue and you choose to run it on the same machine, the title is often getting encoded live at the same time that you're encoding your streams. Doing that means you're doubling down on the encodes simultaneously and you can get some big CPU spikes leading to dropped frames unless you have a very powerful machine with a separate GPU. For typical productions, complex 3D titling is done on a different machine and then run in over an HDMI, SDI, or NDI connection as a separate source.
Encoding Streams with the GPU
Using the GPU to encode video streams instead of the CPU is a popular option. As discussed, CPU resources are precious. We need those CPU resources for the operation of the software, for loading things in the UI, to decode H.264 videos, and generally run all the processes that can't be accelerated through the GPU. Using the CPU for the encoding workflow significantly reduces your overhead if things get backed up. Sometimes the smallest background task can throw a wrench into things. With GPU encoding, it's a completely different processing unit so you're not adding all that work onto your CPU's to-do list. You end up with much more headroom and are far less likely to have an issue.
However, it's not quite that simple as there are different implementations of GPUs. GPU technology is integrated into the Intel i5 and i7 CPUs. Intel calls encoding on these integrated GPUs Quick Sync, and although it can dramatically accelerate encoding, it does suffer some obvious caveats. Since the GPU for Quick Sync is inside the CPU, you are using the same bandwidth or traffic lanes to your CPU to do everything, rather than separating the encoding traffic out to a separate GPU unit like an NVIDIA card. So using Quick Sync can result in increased latencies and increase your GPU usage in ways you won't see on a stand-alone GPU. Quick Sync only works on Intel's integrated GPUs, but in many cases it's still better than doing a higher quality x264 encode if you're hitting your CPU usage limit and dropping frames.
If you truly want to maximize your performance, it's possible to do all your encoding in a standalone GPU card. This can be done utilizing NVIDIA's NVENC technology. NVENC refers to the dedicated encoding (and NVDEC decoding) chips on an NVIDIA GPU. That means, even when using it as an encoder, it's not using any of the actual GPU rendering resources. While it's true that the overall GPU bandwidth will be affected somewhat, it only becomes an issue in very complex, heavy GPU workflows or 4K.
From old school spinning HDDs to solid state drives (SSD), and more recently to NVMe solid state drives, there are a lot of choices out there for storage. Spinning hard disks will give you a lot of storage for a low price. Unfortunately, the technology tends to be very slow. You might be able to get two simultaneous ProRes or MJPEG files written to disk from your ISOs or program out feed before you saturate the disk I/O. If the same HDD has additional content on it that you're relying on for your production such as a video file, you won't be able to read your source file if it can't do the I/O fast enough. That could easily ruin your production.
If your workflow requires a lot of ProRes ISOs and replays, then you'll need fast storage solutions, and possibly more than one, whether it's multiple SSDs, a RAID 0 array, or multiple NVMe or PCIe solutions because the system is reading/writing so much data. You could use multiple drives, one for each encode, but you might run out of SATA ports on your motherboard. This is particularly problematic on laptops since they don't typically have a lot of ports. SSDs have a fairly good price to storage ratio these days (1 TB for $110) and they offer a tremendous increase in speed. This allows you to do more things simultaneously without slowing down—more ISOs, more program recording.
NVMe is the next generation after SSD. This gives you further increased speeds while still being solid state. The difference is that you are not going over a SATA connection, which has its own overhead. Instead, data is going directly over PCIe for even more parallel access. It makes most software exceptionally snappy as it can read and load textures to RAM incredibly fast. NVMe storage commands a small premium over SATA-based SSDs, so you might think you can migrate to all NVMe drives. However, these are best served as an OS drive since most motherboards only have one or two NVMe M.2 slots, as opposed to four or more available SATA ports. Additionally, using more than one of your NVMe M.2 slots, should your motherboard have multiple, often drops your GPU PCIe slot from 16 lanes to 8 lanes, or disables a non-GPU PCIe slot entirely. As you can see, it can get quite complicated, so it's important to read the fine print in the motherboard manual. Any modern-day application should be run on some type of solid-state drive for best performance. Use additional SSD storage if you're dealing with lots of input and outputs.
Google's Kiran Paranjpe, WWE's Jared Smith, World Surf League's Rich Robinson, and NASCAR Digital's Brendan Reiley discuss sports streaming distribution strategies in this clip from Streaming Media West 2019.
Google Head of Sports & Entertainment Kiran Paranjpe, NASCAR Digital Media Director Brendan Reiley, World Surf League SVP Rich Robinson, and WWE SVP Jared Smith discuss what's driving sports streaming in their keynote at Streaming Media West 2019.
VideoRx CTO Robert Reinhardt discusses the key elements of budgeting and bidding live event streams for clients--from labor to equipment to deployment--in this clip from his presentation at Streaming Media West 2019.
Companies and Suppliers Mentioned