Buyers' Guide to Streaming Playback Testing Tools 2019

If you’re streaming live, you can do automatic polling, which will default to the segment size. You can show some metrics and QoS logs. This tool actually works with DASH as well.

Figure 6 shows my master-level manifest alongside video playback. You can see overall stats on what’s playing, like the current bitrate. You can switch to a specific bitrate or run it on auto. You can either control-click or just click to open up a specific variant in the playback window, then watch and monitor.

Figure 6: A master-level manifest alongside video playback in MOE:Viewer

In addition to what you see in the VOD video shown in Figure 6, with live video you can see updates, set up automatic polling, and so forth.

MOE:Viewer has a stall detector as well. So if we’re on a live event and we run into problems—for example, if encoders aren’t updating or it’s not getting to origin on time— we’ll end up getting a Manifest Not Updating alert. We need to be able to identify that immediately and then be able to report it. So we put in the stall detector. You can turn that on and if it sees that it’s not updating in the time period that you specified and it misses more than two refreshes, it blares an alarm at you. Then you can identify the problem. The stall detector will monitor all of your different renditions for that.

The MOE:Viewer automatically breaks out multiple rendition sets if you have them. You can have A and B, and you can test each one of those if you have different origins set up. This is actually used in another tool we have that allows us to rebroadcast a VOD as live. It just takes the Manifest and starts incrementally adding those in at the same time as defined in the EXT tags and so forth.

You can copy manifests, save manifests, all right from the main UI shown in Figure 6. You can specify how much stall threshold you want and be able to go from there. Manifest Viewer will automatically indicate if you have any discontinuities within your metric stats. You can open up multiple windows for different renditions as well, simply by hitting control, and then monitor all of those independently.

We’ve been working on different versions of MOE:Viewer off and on for quite some time. But it’s something that now we use almost daily to test our streams. We also use it a lot when we bring on new hires to give them some good visual training around how HLS works, how manifests actually work, and what they do in the end.

If you’re interested in trying out MOE:Viewer, we are offering access to the private beta to all Streaming Media readers. You can sign up for access to the beta.

Stream Troubleshooting

Now I’d like to go a little bit deeper into stream troubleshooting, which can be very tiring, but is super-valuable. Figure 7 shows an acronym I’ve come up with that at least brings some humor into the process: TIRED stands for translate, isolate, react, engage, and document.

Figure 7: Remember this when you’re tired of troubleshooting your streams.

The first step is you need to understand and translate what the actual problem is. Oftentimes, the first report of problems doesn’t carry a lot of technical information, so we might hear something like, “Well, the video thingy is all square, looking like a bunch of Legos put together.”

What does that actually mean? It might mean we are getting a lot of blocking issues and they could stem from an issue with the decode level or, conversely, an issue on the encoder. The challenge is that the issues, although serious, can seem very vague. Let’s say the video goes black. Well, is it playing and it’s black? Or did it stop playing? We need to be able to translate the actual message we’re getting into something that’s actionable and be able to define that from there.

Oftentimes, it’s a matter of asking questions and trying to replicate the problem. Of course, the most challenging situation is when you have an issue being reported and you can’t replicate it, in which case you’re now relying on the people reporting it. If you’re lucky enough, they’ll be able to actually help you walk through and understand what’s going on so you can figure out what you can do about it.

Step 2 is isolate. Once you actually understand what the problem is—“The video is black but it’s playing; we can see the playhead advancing and we’re seeing network traffic coming in”—then you can begin to isolate the issue. If the video is actually playing—everything’s rolling through but we’re seeing black—then we know in most cases it’s probably a source or an encode issue.

You need to be able to isolate where this is in the stack: source, encoder, origin, edge, CDN—last mile or content in general. Questions arise even when the issue is ads: Was it on the decisioning side, the delivery side, or the ad content side? Is the issue in the last mile, with the client, or in their service provider’s environment? What plugins do they have installed? If the viewer is using an ad blocker, we can deal with that appropriately.

Once you’re able to understand what the problem is, and can isolate where it is within the stack, then you need to be able to react. During large events—especially major live events—often you can’t really do anything because the risk factor is so high that it’s not worth it. Troubleshooting live video is always a balancing act: What could we do to fix this, and is it worthwhile to try?

In many instances, we’ll find that we had bad ad content at the encode or package level. This is good, because we can do something about that for any new users, anyone else who comes in after we get that ad yanked from the delivery scheme. It won’t show up again in the current event.

However, if you find a problem in the player, you’re not going to release a new player build in the next 20 minutes and feel at all confident with that. You’re going to have to make sure you have really good ability to validate, and retest the player later so you know by the next event that this problem won’t ever happen again.

If the problem is at the CDN level or in the delivery, that’s one reason why multi-CDN is so popular right now. It gives you options so you can adjust on the fly. In the past, when content providers typically engaged only one CDN, there wasn’t a clean option to switch stuff over to, let alone do so seamlessly. Nowadays, thanks to multi-CDN, clean switchovers are becoming more and more of a viable option.

No matter what you determine the issue is, it’s critical to react appropriately, and to engage the people that you need to, being mindful of risk versus reward, as always. If you know where the issue is and you want to be able to react, most of the time, with larger-scale events, there are going to be multiple parties involved. It doesn’t mean you can do it all. If you can, then great, you can make those calls. Otherwise, you get the parties involved, whether it’s Akamai or Limelight or Level 3 or whoever you’re working with, and then you tell them what you need. Just remember that this can be a time-consuming process.

In many cases, you need to have your CDN involved to be able identify the problem, and the first thing those staffers will ask for is a network or console log and steps to replicate. The second you start seeing issues, capture everything you can, and document it. Begin by capturing network traffic in Charles or HAR format to make sure you have that good, solid base. Then try to replicate it. But don’t try to replicate it immediately because you might lose the ability pass along any data that was actually valuable.

Having a solid set of tools, as well as experience in how to use them, how to interpret the information they provide, and how to capture valuable data to share and execute on, is the foundation for effective video engineering troubleshooting. The more you work with a system, the more you will understand its weak points, and the quicker you’ll be able to triage and find the root of a problem. You might not be able to solve everything as quickly as you want, but as you learn and understand the weaknesses, you can build in systems and solutions that deal with them better. Happy hunting, and keep the bits flowing.

[This article appears in the March 2019 issue of Streaming Media Magazine as "Streaming Playback Testing Tools."]

Get instant access to our 2019 Sourcebook. Register for free to download the entire issue right now!