After a long build-up, MPEG4 is finally roaring out of the gates. Read on for a look at the frontier of MPEG4 video - interactive video capabilities that will leave you oohing and ahhing. We'll look at the basics, then walk through a complete XMT/SMIL code example that shows how it's done.
MPEG4 is finally starting to gain some traction. The allure of platform and vendor independence and ubiquitous players on all kinds of devices is strong. But in many areas, MPEG4 is still a "bleeding-edge" technology. You'll quickly feel the pain when you try to do any but the most basic audio/video delivery using it. Today, all the major streaming players support MPEG4, mostly through the EnvivioTV plugin. And Apple's Quicktime lets you convert all kinds of movies to MPEG4 using the best-$30-you-ever-spent-on-software Quicktime Pro. But to really unlock the promise of MPEG4 – universal and reliable authoring and playback of complex interactive multimedia – you still have to go out on the edge.
In the lead-up to this article, I looked all around for information about the practical use of MPEG4 for interactive media content. Here's what I found: Amazon lists a small handful of books specifically about MPEG4, most of which cost more than $90 (and just try and find any in your local bookseller's megastore). A search of the Web for MPEG4 interactive authoring turned up numerous sites giving voice to the MPEG4 hype; a handful that taunted me with incredible authoring and playback tools that I can't download or read technical detail about; and just one that had useful tools that I could actually play with. Of course, that one included no real documentation about authoring techniques or compatibility with other MPEG4 products, but did include lots of examples to play around with. In short, working with interactive MPEG4 at this stage is a bit like running with scissors.
How Interactive MPEG4 Authoring Works
If you've used SMIL or VRML, you're halfway to being able to create rich and powerful MPEG4 content. MPEG4 leverages both of these languages to its own ends. But there are some differences that you'll notice right away when you start working on it. SMIL is text-based code that tells the video player where to get and how to play the media elements in the movie. With SMIL you have to distribute and keep track of all the elements – the video, audio, any image files, etc - as separate files. MPEG4, on the other hand, takes a different approach. It is a binary format that encapsulates all the media elements and interactivity instructions within it, all wrapped into one neat package.
The creation process is a bit like compiling computer code. You write your code in XMT-O - "Extensible MPEG-4 Textual Format," a high-level language that's based on SMIL 2.0. It's not the same as SMIL – they are just enough alike that it's easy to figure one out if you know the other; and they're just different enough that you'll drive yourself crazy keeping them straight. Once you've finished your XMT-O file, you run it through a tool that compiles the code and all the media files into a single binary .mp4 file. Now you can admire your handiwork as it plays back flawlessly.
There's also a lower-level MPEG4 textual language called XMT-A. With its roots in VRML, XMT-A is far more complex to write than XMT-O. It's a bit like writing computer assembly language code by hand. There may be some high performance and other specialized situations when you'd need to do it, but most people won't have to and won't want to.
Real Examples
So, enough of this background – let's dive into some working examples. The enabler for our little foray into interactive MPEG4 is the IBM alphaWorks Toolkit for MPEG4. The toolkit can be downloaded for evaluation purposes for free at the alphaWorks site. Download and unzip the IBMToolkitForMpeg zipfile and you'll find three components inside:
AVGen, a utility to combine separate audio and video files into a single mp4 file.
M4Play, an MPEG4 player (we'll use this to view our creation)
XMTBatch, the compiler that makes XMT-O metafiles into movies
You'll need a recent Java runtime to run it – the newer the better, available by clicking the "Get It Now" link at http://java.com/en/index.jsp.
To create my first simple interactive MPEG4 presentation, I decided to remake the watermarking using SMIL example for my recent article "How to Brand Your Video with a Watermark." In that example, a semi-transparent watermark overlays the video in the bottom right-hand corner of the video image. If you pass the mouse over the watermark, it "lights up", becoming opaque until you move the mouse away.
Here's the XMT-O source code for our sample program. It starts with all the standard XML declarations and such:
That's followed by a section which, just like with SMIL, defines the color and size of the playback window, as well as any regions and subregions we'll need. Notice that the syntax has subtle differences from SMIL 2.0.
With the head defined, we'll add the body section, which describes our media. You'll notice that it looks a lot like SMIL, with the familiar video, audio, and img tags. I created the media files with Quicktime Pro, loading in my AVI source media and using Export to convert to MPEG4. Notice that in the src attributes of the video and audio tags, I had to append #video and #audio to specify which track of the source MPEG4 file I was referring to.
To try this yourself, first download the mp4_watermark_example.zip file from the MPEG4 samples page. Then open up the directory into which you previously downloaded and unzipped the IBM Toolkit for MPEG4. If you're using Windows, double-click on XMTBatch.bat. On any platform, you can launch it by typing java -cp IBMToolkitForMpeg4.jar XmtBatch at the command prompt. Simply select the XMT-O input file and an MP4 output file destination, and click Start. You can view the MP4 by double-clicking M4Play.bat (from a command-line: java -cp IBMToolkitForMpeg4.jar M4Play) and opening the MP4 file.
Profiles and Compatibility
MPEG4 is designed to be useful for video playback across a wide variety of devices, from cell phones to powerful desktop computers; from pocket sized handhelds to TV set top boxes. To support this flexibility, the spec is divided into different profiles and levels, each defining a subset of MPEG4's total feature set. An MPEG player will support a particular profile by implementing all of that profile's features. IBM's SamplesForMPEG4 (also available at alphaWorks) includes dozens of examples of varied XMT and MPEG4 features. Many of these play in the QT and Real players, while others do not. (Of course, they all play in IBM's M4Play, part of the Toolkit.)
Frankly, I didn't know that my watermark.mp4 example wouldn't work in the RealOne and Quicktime players until I tried it. After all, both have rich support for MPEG4 via the EnvivioTV plugin. From the information published by Envivio, Real, Apple, and IBM, it's not easy to figure out what profiles and levels are supported by each player; or which ones are required by the various features I choose to use in my XMT-O code. Let's face it, out here on the cutting edge you have to get used to living with uncertainty.
In this column over the coming weeks, we'll try to dull that edge by diving into more detailed information and examples of how MPEG4 can be put to practical use today.