Cracking the H.264 Codec
H.264 is undoubtedly the hottest codec around, but there are inherent market forces that complicate producing files that meet the needs of your target playback device or player. These include the fact that there are multiple H.264 codecs available, each with different configurable parameters, and that each H.264 encoding tool reveals a custom set of compression options.
So, here’s my attempt to cut through the smoke: an oracle in three parts. First, I’ll tell you what you need to know about H.264 itself, then I’ll describe the H.264 playback environment and, finally, the encoders.
The H.264 Standard
Briefly, H.264 is a video compression standard known as MPEG-4 Part 10, or MPEG-4 AVC (for advanced video coding). It’s a joint standard promulgated by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG), and it has the full weight of both organizations behind it, delivering plenty of marketing momentum.
H.264’s audio sidekick is AAC (for advanced audio coding), which has replaced MP3 and is technically designated MPEG-4 Part 3. Both H.264 and AAC are technically MPEG-4 codecs—though it’s more accurate to call them by their specific names—and compatible bitstreams should conform to the requirements of Part 14 of the MPEG-4 spec. Since the audio side has few options other than data rate, I’ll focus almost entirely on the video side.
According to Part 14, MPEG-4 files containing both audio and video, including those with H.264/AAC, should use the .mp4 extension while audio-only files should use .m4a and video-only files should use .m4v. Different vendors have adopted a range of extensions that are recognized by their proprietary players, such as Apple with .m4p for files using FairPlay Digital Rights Management and .m4r for iPhone ringtones. Mobile phones use the .3gp and .3g2 extensions, though I don’t discuss producing for mobile phones in this article.
Like MPEG-2, H.264 uses three types of frames, meaning that each group of pictures (GOP) comprises I-, B-, and P-frames, with I-frames such as the DCT-based compression used in DV and B- and P-frames referencing redundancies in other frames to increase compression.
Interestingly, when H.264 development started in early 1998, the project was called H.26L, and its target was to double the efficiency of any existing codecs, including MPEG-2. That would mean the same quality at half the bitrate, a metric that H.264 clearly delivers in most applications.
Also interestingly, like most video coding standards, H.264 actually standardizes only the "central decoder … such that every decoder conforming to the standard will produce similar output when given an encoded bitstream that conforms to the constraints of the standard," according to an article titled "Overview of the H.264/AVC Video Coding Standard" published in IEEE Transactions on Circuits and Systems for Video Technology (ITCSVT). Basically, this means that there’s no standardized H.264 encoder and, in fact, that H.264 encoding vendors can utilize a range of different techniques to optimize video quality so long as the bitstream plays on the target player. This is one of the key reasons that H.264 encoding interfaces vary so significantly among the various tools.
Profiles and Levels
To make H.264 relevant to a range of devices, the standard contains both profiles and levels. Briefly, a profile "defines a set of coding tools or algorithms that can be used in generating a conforming bitstream, whereas a level places constraints on certain key parameters of the bitstream," according to the ITCSVT article. To explain, a profile defines specific encoding techniques that you can or can’t utilize when encoding the files (such as B-frames), while the level defines details such as the maximum resolutions and data rates.
H.264 includes seven profiles, with the following taxonomy courtesy of Wikipedia (http://en.wikipedia.org/wiki/H264):
Companies and Suppliers Mentioned