-->
Save your seat for Streaming Media NYC this May. Register Now!

Creating Interactive Video With MPEG4

MPEG4 is finally starting to gain some traction. The allure of platform and vendor independence and ubiquitous players on all kinds of devices is strong. But in many areas, MPEG4 is still a "bleeding-edge" technology. You'll quickly feel the pain when you try to do any but the most basic audio/video delivery using it. Today, all the major streaming players support MPEG4, mostly through the EnvivioTV plugin. And Apple's Quicktime lets you convert all kinds of movies to MPEG4 using the best-$30-you-ever-spent-on-software Quicktime Pro. But to really unlock the promise of MPEG4 – universal and reliable authoring and playback of complex interactive multimedia – you still have to go out on the edge.

In the lead-up to this article, I looked all around for information about the practical use of MPEG4 for interactive media content. Here's what I found: Amazon lists a small handful of books specifically about MPEG4, most of which cost more than $90 (and just try and find any in your local bookseller's megastore). A search of the Web for MPEG4 interactive authoring turned up numerous sites giving voice to the MPEG4 hype; a handful that taunted me with incredible authoring and playback tools that I can't download or read technical detail about; and just one that had useful tools that I could actually play with. Of course, that one included no real documentation about authoring techniques or compatibility with other MPEG4 products, but did include lots of examples to play around with. In short, working with interactive MPEG4 at this stage is a bit like running with scissors.

How Interactive MPEG4 Authoring Works
If you've used SMIL or VRML, you're halfway to being able to create rich and powerful MPEG4 content. MPEG4 leverages both of these languages to its own ends. But there are some differences that you'll notice right away when you start working on it. SMIL is text-based code that tells the video player where to get and how to play the media elements in the movie. With SMIL you have to distribute and keep track of all the elements – the video, audio, any image files, etc - as separate files. MPEG4, on the other hand, takes a different approach. It is a binary format that encapsulates all the media elements and interactivity instructions within it, all wrapped into one neat package.

The creation process is a bit like compiling computer code. You write your code in XMT-O - "Extensible MPEG-4 Textual Format," a high-level language that's based on SMIL 2.0. It's not the same as SMIL – they are just enough alike that it's easy to figure one out if you know the other; and they're just different enough that you'll drive yourself crazy keeping them straight. Once you've finished your XMT-O file, you run it through a tool that compiles the code and all the media files into a single binary .mp4 file. Now you can admire your handiwork as it plays back flawlessly.

There's also a lower-level MPEG4 textual language called XMT-A. With its roots in VRML, XMT-A is far more complex to write than XMT-O. It's a bit like writing computer assembly language code by hand. There may be some high performance and other specialized situations when you'd need to do it, but most people won't have to and won't want to.

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues