-->
Save your seat for Streaming Media NYC this May. Register Now!

Building a DASH264 Client: Streaming Media East 2013

Article Featured Image

One of the interesting things about DASH is it’s codec agnostic and the reason for this is it makes it more future-proof. DASH describes the content, tells how the content should be segmented and how it should be described in a manifest, but it doesn’t say anything about what codecs you must use. What are the video codecs? What are the audio codecs? What this means to us is that when H.265 is ready for the world, we can simply use the same DASH idea and the content that we’re delivering happens to be to H.265 now. Or if you want to deliver news in VP9, or you want to deliver whatever it is you want to deliver, so long as you have a player that’s capable of dealing with that content, DASH doesn’t care. The DASH manifests itself as an MPD file. It’s just XML that describes to us the content of the files that we’re dealing with. But the reality is that the DASH specification is so broad and so agnostic that it becomes very difficult to implement today. So the DASH industry forum, which is, again, many of the same companies that we were talking about before—there’s Microsoft, and there’s Adobe and there’s Akamai, small companies like ours and others. Netflix is in there. Sixty-some-odd companies these days got together and formed a DASH industry forum and have recently released a DASH264. I’m gonna call it a specification, but it’s not quite the right term for it. It is an implementation recommendation, something along those lines. And this was actually just released yesterday. You can find it up on the DASH Industry Forum website, DASHIF.org. If you click on the resource library tab, the top link, our implementation guidelines for DASH264, or its full name DASH-AVC/264. DASH264 flows much easier off the tongue. And what’s interesting about DASH 264 is it provides all the basic functionality that you’re going to need. It’s still going to handle the advance use cases, but it focuses on a single codec. Right now the vast majority of video on the web is in H.264. So rather than dealing with a spec that can handle anything for anyone at any time, we have a specification about “How do we handle it specifically within H.264?” Well, it allows for millions of different variations on how things might be described. DASH264 suggests, “Here’s some very specific things that we can do.” And it helps to narrow down what the DASH world can look like and helps create much better interoperability across all the people and the players, whether from the encoders to the DRM to the CDNs, to the client-side. There’s a number of different players that get involved and try to work out interop. When you have a spec that can allow for virtually anything, it’s not simple. So DASH264 is trying to provide for us a concrete set of implementations that we can work with.

As I mentioned the industry forum is the company behind this. We’re a forum behind this. And their mission—again, the objective is really to promote and catalyze adoption of DASH, is really the primary goal. We want people to use DASH. We see the benefits of having standards and having one particular way of describing content rather than using all the various standards that are out there today, we’d rather use one. It makes everybody’s lives easier when we don’t have to segment the content in four different ways for four different platforms. 

So with all that said how do we go ahead and build a DASH player? How do we make this work? So we’re gonna spend a little time looking at the internals of what is DASH? How do we describe-- what’s in the manifest file of DASH? At Digital Primates we had a chance to work on DASH players for a number of different technologies. We spent a lot of years building video players in Flash so we started there, because that’s what most of our clients at the time were using. But we’ve since then branched out: We’ve built our Android version, we’ve built our HTML 5 and JavaScript versions, which is now the basis for the DASH.js open source project up on GitHub. As I say, it’s totally open source, free and available for use, up on GitHub. It’s a BSD-3 license and that one is just up on GitHub.com/DASH-Industry-Forum, is where you’ll find this particular project.

And currently DASH.js is the reference player for DASH industry forum. Meaning, we’re building this in such a way so that people who have content in DASH or people who have DRMs or people who have audio codecs or whatever other—all the members of the consortium can get together and we have one place to test our content and say, “Yes. It works. It’s compliant.” That’s the idea of the reference player.

So at the high level how do we play DASH content? Well, here’s your basic steps. First, you need to get the manifests. So we always start with a URL to an MPD file. As I say, the manifest, simply an XML file that describes to us where the segments are, how are the segments encoded, what’s available to us? So we’ll download the manifest, we’ll parse the manifest. Off of that download, we’ll start making calculations. We’ll figure out “How big was this file?” How long did that take to come down? How much bandwidth do we think they have? So we make an initial decision about what their bitrate should be. And we’ll come back and re-make this decision over and over again. But we need to decide for that first fragment we’re going to play, “What’s the optimal bitrate?” And so our first decision is based on that. How big was the MPD file we downloaded? How long did it take from when the request was made to when the request was filled? So once we’ve determined the optimal bandwidth, we’ll initialize the client for that bandwidth, we’ll download the segment—the first segment that we’re going to play-- and hand the segment over to the media source extensions to let it play it back. As we’re downloading that segment, we’re watching, again, how big was the segment? How long did the segment take to download? And that becomes one of our metrics that’s used for determining how we adapt the bitrate. The bandwidth itself is just one of the many rules that we can use. And, of course, as an open source project people are free to add their own rules, to choose which rules they want and which rules they don’t want. But we’ll get into more of that later. So this is the basic process, right? We download a segment, hand off the segment.

Whenever we change bitrates we have to reinitialize it for that particular bitrate, and the reason for that is that people will often use different profiles of a codec for the different versions of the content for 400K stream and the 3MB stream are using very different profiles of the codec. So we’re going to say, “Okay,” we tell the player, “Now, here’s what the next set of content’s going to look like.” We only reinitialize whenever we change bitrates or when we first start, which is a change from null to some first bitrate.

So lets' look at the DASH structure itself. It’s got three core types of files. We’ve got the MPD file, which is our manifest. Simple XML describes the segments; we’ll take a look at one of these in just a minute. And then there are two different types of binary files. Now, the reality, according to specification, they don’t need to be two different files. You can have self-initializing segments. I haven’t actually seen self-initializing segments yet from anybody that’s providing these segments, but according to the spec, it is possible. But, generally, our initialization files contain all the headers that we need to prime the player. And we’re going to say, “Here’s the kind of things you’re going to playing now.”

And then there’s the individual segments, which are actually the playable media. And there are different segments for video and audio. Now that’s actually very interesting and that’s not something that I’ve seen done with most of the other streaming technologies. Any thoughts on why we may want to separate video and audio?

You might have different languages, because we have one set of video content and different audio tracks that you want to play. Or, in live sports, we’ve done a lot of work with major league baseball over the years. You may want to have the same video, but you want to switch [audio]. Instead of the video TV announcers, maybe you want to listen to the radio announcers from that same team. Maybe you want to listen to the same team. Maybe you want to listen to the other team’s announcers. As a Red Sox fan living in New York, believe me, I never want to listen to the local announcers. But there’s different reasons why this might be useful. But also very interesting about it—when the video and audio are muxed together at the file, every video file needs to also be big enough to support that audio track. But most of the time when we’re switching bandwidth, what we’re switching is because the video needs a different quality. But the audio quality is in most cases going to stay the same. Any of my friends from Fraunhofer or Dolby or DTS may disagree with me on this one, because they focus purely on the audio side of it. But in most cases that I’ve dealt with my clients, bitrate switches are about the video. So why do we need to take up that much more file space to have exactly the same audio data in all of our different segments for the video? It makes more sense, it’s more efficient to have them separated. And that’s one of the things I like about the structure here.

So our DASH manifest has root node, one or more periods. Each period has one or more adaptations set for the video and one or more adaptations set per audio. The idea about an adaptation set— anyway, this is just a little bit of terminology to understand it—is functionally equivalent content. Now what this means—there’s been a lot of debate back and forth of what functionally equivalent means. What I’m seeing mostly in streams that I deal with these days are that video content that is of the same aspect ratio, is functionally equivalent. Right? So all of our 16x9 streams are all in one adaptation set, all of our 4 x 3 streams are in a separate adaptation set. Why? Well, because it’s far more jarring for an end user to switch from a wide screen video to a 4x3 than it is to switch from a high quality to low quality within the same size. Likewise, on the audio side, most of the time what I’m seeing is that the same adaptation set will, you know, all of the stereo bitrates will be in one adaptation set. All of the seven-channel surround will be in another adaptation set. Because it’s a bit jarring for users to have speakers suddenly shut off and as we go down from seven-channel surround to stereo. But, also, that’s the idea of an adaptation set, is functionally equivalent content. And each adaptation set then has one or more representation, which describes in bitrates the contents available. So far, so good? You guys with me? All right.

So here’s the challenge: DASH allows for us to describe these segments in a lot of different ways. So the most common ways that we’re seeing this these days-- we can describe the individual segments, which again are the individual pieces of content. There are video segments and audio segments. We can use a segment base, which simply points at a single file in the server and we use HTTP byte-range requests to say, “This time I want you to give me bytes 1837 until bytes 4322,” and it hands down just that piece of a file to you. And this is less work for the folks who are storing the content; they don’t need to do all the segmentation. The segmentation happens on the fly. It’s more work on the server at the fly, but I find a lot of folks are using that. We also have the idea of a segment list, which is a specific list of each piece of content in the URL that goes along with that piece of content. And so you’ll find that these can get quite long. As _______ describe, okay, we’ve got this particular segment for this stretch of data-- has this particular URL. And the ones I like best because they’re most concise is a segment template. And the template simply provides for us a series of wild cards we can calculate on the fly exactly which segments fit where.

So let’s take a look at a couple of these. Here’s the segment list. All right, so this is off of one of the test vectors. I forget which one from the industry forum where you’ll notice that we’ve got a representation that we’re describing here, you see the codec, the width, the height, the bandwidth of this one. This is about a 500K stream. And so our segment list first describes an initialization segment and then a series of URLs for the individual pieces of data that it’s going to download.

A segment template. This is, again, one of the ones I prefer because we can have much more concise the whole thing in describing this adaptation set really is-- one, two, three, four, five, six, seven-- without my line breaks to make it more readable for you, seven lines described everything you need to know about this particular adaptation set. So what we have here is we’ve got a segment template that tells us where the initialization file is. And then for this particular representation it says, using this template, “I’m going to look for this wild card number. And I’m just going to change that each time.” And since our duration is always the same-- right? In this case here, the duration is “13809”. It knows after that time “I need segment two, I need segment three, I need segment four.” And it’s a nice, concise way of describing our content. The other thing that we occasionally see are segment templates where the duration of each segment can be different. So with a variable duration, we’ll have our template that specifies we have a combination of a template and a timeline. And you’ll notice, starting at time zero for this duration, we have the first segment and then second to third to fourth. And if we have the same duration over and over again we’re able to use a repeat node to say, “Repeat this one twice.” And so that’s a combination of the two. Questions on describing this?

All right. So this is all the information that we get directly out of our manifest file. And regardless of what technology you use to build a DASH player, you’re going to need to deal with a manifest. So let’s take a quick peek here for a minute-- add our DASH player and we’ll see how our bandwidth is here today. If the bandwidth isn’t adequate to play it, I’ve got a local copy of all the same segments that I can show you as well. So this is our DASH reference client, currently off of the DASHIF.org site. And so, like, we have a number of different factors we can play. I’m going to try Netflix’s here. We’ll load this one up and see how it’s doing. All right. So there is audio to this. My audio output on my laptop is shot unfortunately, but, trust me, there’s audio. And if I open up the network panel, you should see that, really, we have a whole series of different requests for the individual segments. Right? We’re making requests to the server saying, “Now I want this segment. Now I want the next segment.” And that’s what each of these in here are, right? Are requests for the various DASH files, which are the individual segments that are playing. And as I mentioned, there’s different versions. You notice the list of test vectors when I’m using the released version of the client is-- we’ve got about five vectors that are working. If I switch over to the Canary version of Chrome...I have a much longer list of possible vectors that can work for us.

So let’s take a look at one of the ones off of Microsoft’s Azure servers. And, again, so what you have here, individual requests are being made to the server, bringing down bits of content playing it for us as I need and constantly evaluating how long did it take to get that segment. Do we have enough time? How is it performing? Am I dropping frames?  How long is my buffer? Is my buffer emptying faster than my buffer is filling? These are things that could be used to determine “Should I switch up or should I switch down? How do I adapt my bitrate?” The other interesting thing about the client, of course, is that this is built as a lab-quality client with lots of debug information. So if we want to see, for instance, what’s happening within the buffers, I can watch my charts here-- oh, that’s not a good sign.  That’s dropping-- oh, that’s the end of it. That’s why.

So we can watch what’s happening with the buffer. We can get a whole lot more debug information from it. So all of our console logs are available to us here and we can watch metrics of the video or metrics of the audio, find our release notes from the player, are all available to us here.

Now what, for me, is really good about the project is this is no longer just a Digital Primates project. This, as an open source project. We’ve got contributors from us as well as from Microsoft and from YouTube, which is great. That could be either the laptop performance-- it wasn’t dropping frames so I’m not exactly sure if the choppiness is a combination of simply the VGA output, if it’s the laptop processor going on, if it’s the bandwidth, if there’s any number of variables that could be problematic.

Again, if it is, it’s an open source project and we can continue to tweak it. It’s not done. But DASH.js player you’ll notice in terms of the version up top, we’re at Version 0.2.3. We are close to a Version 1 release. The main thing that’s holding us off from a Version 1 release is having the browser that will support in public release all of our test vectors. We’re getting closer. We have, I believe, twelve of the fourteen DASH-IF test vectors working in the nightly build, in the Canary version of the browser. The Microsoft system I just showed you works in Canary; it doesn’t work in the release version yet. That’s not a problem with Microsoft, it’s just that code hasn’t made it through the paths all the way up to release yet. The last two test vectors that are not working yet are specifically ones that are using DRM and the encrypted media extensions are not available in the release version of Chrome yet either. They are only available in Chrome for Chrome OS, which is a tiny subset. So once we have-- once we can demonstrate a publically released browser that plays all the vectors, we will have Version 1.0.

Audience member: <inaudible>

Jeff Tapper: The test vectors-- exactly-- are specifically DASH264 test vectors.

Audience member: <inaudible>

Jeff Tapper: Yeah, the Netflix stream does start a little strangely. Here, let’s--

Audience member: <inaudible>

So this one here is-- come on, laptop. Now this is my browser. There we go. So this is, again, another DASH-encoded stream that is here and, again, nice high quality, playing very well all-- this particular one I have locally just for when times [sic] that are bandwidth issues. Okay. Anyway. So let’s-- are there questions on the player and how it works? Yes, sir?

Audience member: <inaudible>

Well, it depends on where in the process you are. Ultimately, what you need to have is you need to have a file that’s been segmented for DASH. Once it’s been segmented any webserver can serve it. So the reality is this video that’s playing in the background here, is just-- these are individual segmented files sitting on my hard drive coming through Apache. So the trick is that you need to have it segmented. And there are both commercial and open source segmenters available. Is segmenters the right description of [that] piece of tech? I’m not sure of the industry standard on what they call that thing, but I refer to them as segmenters because they take it and chop it up in a DASH format.

Audience member: <inaudible>

There is absolutely support for the closed captioning as well. The DASH specification describes exactly how closed captioning should behave, how it should work, how it should be described within the content.

Audience member: <inaudible>

So we’ve got this working both for live and for video-on-demand content. Live is continuing to be a work in progress, largely because we haven’t had a lot of streams yet to work with. So far each of the members of DASH-IF that says, “Hey, I’ve got a live stream now,” we sit down, we work with them and make sure it’s fully supported. But it’s an ongoing process and I don’t want to call live “complete” because I haven’t worked with enough test cases of true live content, yet.

Audience member: <inaudible>

The question, of course, is latency within DASH and this is a case where it’s very interesting and I like the way DASH handles this better than most of the other HTTP streaming technologies. We’re able to get much lower latency by tweaking some things within the manifest and within just the nature of DASH itself. One of the things that we’re able to do by adjusting the size of the segments, we’re able to get much closer to that live edge, by adjusting how much time we have to have in the buffer before we start playing. Where I know Apple, as a recommendation, suggests ten-second segments and suggests you have a full three segments in the buffer before you start playing, we don’t have that requirement. We can do that if we want, but if we want to get closer to live edge, we’re able to with one of our clients that has live content. We’re generally about three to five seconds behind the live edge.

Audience member: <inaudible>

Jeff Tapper: Oh, so the whole adaptive bitrate side of this is going to vary vastly based on the bandwidth you have available. There is, again, one of the major factors that we use in determining switching, determining when to adapt is how much bandwidth you have, how long is it taking you to play a segment versus how long it takes you to download a segment versus how long it takes to play that segment. So if you’re sitting on the far end of a T3 and you’re able to pull down a ton of content all the time, great: We can feed you very high quality. One of the challenges we had with one of our clients with live was the place where the content was coming from was actually not a great connection. So they couldn’t upload the content to us fast enough for us to play the high quality and we kept having to switch down to lower and lower qualities. So there’s-- you know, the Internet [sic] still be-- is the Internet and has a set of challenges to it. But-- yes, sir?

Audience member: <inaudible>

Jeff Tapper: I’m going to repeat that for the camera. One of the features of Azure’s cloud platform is it allows you to specify in your request how long the segments should be. So you can start with shorter segments to get faster start up, quicker, and as the client you can adjust how much buffer time you have. So you can start with a very small b-- say, I only need a second or two of data to start playing, and then increase both the duration of the segments and the buffer over time at the client or within the server and the client. Did I summarize that pretty reasonably?

There’s a lot of different calculations that will go into this, but yes, you, actually...I’m drawing a blank now on what the individual slices of content, which are not quite a frame of data but not much more than a frame of data at a time, and you can start getting some of those in and start playing those before the whole segment is downloaded.

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues