The Skinny On ADPCM Data Formats by Mike Melanson (melanson@pcisys.net) v1.6: January 7, 2003 Contents -------- * Introduction * IMA ADPCM * DVI ADPCM * Microsoft ADPCM * Duck DK4 ADPCM * Duck DK3 ADPCM * Westwood Studios ADPCM * SMJPEG ADPCM * Dialogic ADPCM * Other ADPCM Formats * Appendix A: Tables * References * Acknowledgements * ChangeLog Introduction ------------ ADPCM stands for Adaptive Differential Pulse Code Modulation. It's a method for compressing audio samples at a constant 4:1 ratio. It only operates on 16-bit PCM samples. It's fairly simple to implement and can be performed with integer operations (although some special variations employ floating point operations). The goal of this document is to provide the specific algorithms and on-disk data formats necessary to implement various ADPCM decoders. A discussion of the mathematical foundations of ADPCM is beyond the scope of this document, mainly because the author doesn't understand the math behind the algorithms. IMA ADPCM --------- IMA stands for the Interactive Multimedia Association. This group defined a standard ADPCM algorithm for encoding and decoding 16-bit audio samples. However, they did not specify how the coded ADPCM data was to be stored on disk. It is important to note that Microsoft and Apple went their separate ways on this matter. In an Apple Quicktime MOV file, IMA ADPCM data is denoted by the codec FOURCC "ima4". In a Microsoft media file (WAV, AVI, or ASF), this audio format is denoted by format 0x11. The endian encoding of multi-byte numbers will depend on the file type in which the media occurs. IMA ADPCM data comes packaged in blocks of bytes. A block of data will have a preamble consisting of an initial predictor and an initial step index. These values serve as starter values for decoding a series of nibbles. Following the preamble will be a number of bytes that will be decoded on a nibble-by-nibble basis. [See Appendix A for reference IMA ADPCM tables.] The algorithm for decoding IMA ADPCM is as follows: initialize predictor variable from ADPCM block preamble sign extend predictor clamp predictor within a signed 16-bit range initialize step_index variable from ADPCM block preamble clamp step_index between 0 and 88 initialize step as step_table[step_index] After initialization, iterate through each nibble of each byte in the remainder of the block following the preamble. Decode the lower nibble first (bits 3-0), then the upper nibble. For each nibble: step_index += index_table[(unsigned)nibble] clamp step_index between 0 and 88 (table limits) diff = ((signed)nibble + 0.5) * step / 4 predictor += diff clamp predictor value within signed 16-bit range and output to decompressed audio stream step = step_table[step_index] A note about the following calculation: diff = ((signed)nibble + 0.5) * step / 4 Many ADPCM codecs like to use this little code fragment that is close to the real computation, but doesn't require any fractions, multiplications, or divisions: sign = nibble & 8; delta = nibble & 7; diff = step >> 3; if (delta & 4) diff += step; if (delta & 2) diff += step >> 1; if (delta & 1) diff += step >> 2; if (sign) predictor -= diff; else predictor += diff; As mentioned at the start, the IMA algorithm is the same between codec format definitions, but the on-disk data format is different. In an Apple Quicktime MOV file, each individual block of ADPCM data is 34 (0x22) bytes in length and decodes to 64 16-bit signed PCM samples. The first 2 bytes of the chunk comprise the preamble with the initial step index and predictor values for the chunk. These 2 bytes are treated as a big-endian value. Bits 15-7 are the top 9 bits of the 16-bit initial predictor value. Bits 6-0 are the initial step index. The 16-bit preamble is laid out as follows: pppppppp piiiiiii The remaining 32 bytes, or 64 nibbles, in the block are decoded into 64 16-bit PCM samples. For stereo data, a Quicktime file will store a block of left channel data and right channel data interleaved with each other as follows: 34-byte ADPCM block: 2 bytes initial predictor, index for left channel 32 bytes ADPCM codes for left channel 34-byte ADPCM block: 2 bytes initial predictor, index for right channel 32 bytes ADPCM codes for right channel A Microsoft media file (AVI/ASF/WAV) will contain a WAVEFORMAT structure that contains, among other fields, nBlockAlign which reveals the size of an IMA ADPCM block in a format 0x11 file. Each block contains 4 preamble bytes for each channel. Each channel has the following preamble data: bytes 0-1 initial predictor (in little-endian format) byte 2 initial index byte 3 unknown, usually 0 Following the 4 or 8 preamble bytes (depending on number of channels) are the ADPCM nibbles. If the data is stereo, the first 4 bytes comprise 8 left channel ADPCM nibbles followed by 4 bytes comprising 8 right channel ADPCM nibbles. This interleaving continues until the end of the block. DVI ADPCM --------- Apparently, there are two variants of DVI ADPCM, the old variant and new variant. The new variant is the same as the IMA ADPCM algorithm. The old variant, however, decodes the upper nibble (bits 7-4) first, then the lower nibble. Microsoft ADPCM --------------- Microsoft's variant of ADPCM is rather different than the IMA/DVI variant. All multi-byte numbers are stored in little endian format. In a Microsoft media file (WAV, AVI, or ASF), this audio format is denoted by audio format 0x02. The number of samples per ADPCM block is obtained through the nBlockAlign field of a media file's WAV header. ADPCM nibbles are decoded from the upper nibble (bits 7-4) first, then the lower nibble. A MS mono ADPCM block begins with the following preamble: byte 0 block predictor (should be [0..6]) bytes 1-2 initial idelta bytes 3-4 sample 1 bytes 5-6 sample 2 The initial idelta and both samples are signed numbers (so take sign extension into account). The block predictor value is used as an index into two adaptation coefficient tables in order to initialize two coefficients, coeff1 and coeff2. The initial 2 samples from the preamble are sent directly to the output. Sample 2 is first, then sample 1. The remaining samples are decoded from the ADPCM nibbles, which comprise the rest of the bytes in the block. For each nibble: predictor = ((sample1 * coeff1) + (sample2 * coeff2)) / 256 predictor += (signed)nibble * idelta clamp predictor within signed 16-bit range PCM sample = predictor send PCM sample to the output shuffle samples: sample 2 = sample 1 sample 1 = calculated PCM sample compute next adaptive scale factor: idelta = (AdaptationTable[nibble] * idelta) / 256 clamp idelta to lower bound of 16 For stereo data, the preamble stores interleaved initialization values for the left and right channels: byte 0 left channel block predictor (should be [0..6]) byte 1 right channel block predictor (should be [0..6]) bytes 2-3 left channel initial idelta bytes 4-5 right channel initial idelta bytes 6-7 left channel sample 1 bytes 8-9 right channel sample 1 bytes 10-11 left channel sample 2 bytes 12-13 right channel sample 2 Following the preamble, the left and right ADPCM samples are interleaved within each byte. The upper nibble (bits 7-4) contains the left channel ADPCM code and the lower nibble contains the right channel ADPCM code. See Appendix A for MS ADPCM reference tables. Duck DK4 ADPCM -------------- Some Sega Saturn game CDs contain AVI files which store audio using the Duck DK4 ADPCM algorithm. These AVI files report format 0x61 as their audio codec. DK4 data can be decoded using the same algorithm and tables as are used to decode IMA ADPCM data. The name apparently comes from the fact that 4 ADPCM nibbles decode to 4 16-bit PCM samples, in constrast to Duck's DK3 ADPCM algorithm, in which 3 ADPCM nibbles decode to 4 16-bit PCM samples. It's important to note that WAVE format 0x61 is not officially registered to the Duck Corporation. Official registries of WAVE formats typically list this number as being registered to ESS Technology. All multi-byte values (actually, there's only 1, the initial predictor value) are encoded in little-endian format. The length of a single block of DK4 is encoded in the nBlockAlign field of an AVI file's WAV header. A chunk begins with the following 4-byte preamble: bytes 0-1 initial predictor byte 2 initial index byte 3 unknown, usually 0 If the audio is mono, the preamble applies to the single audio channel. If the audio is stereo, that preamble applies to the left channel and another preamble for the right channel will follow in the next 4 bytes: bytes 4-5 initial predictor for right channel byte 6 initial index for right channel byte 7 unknown, usually 0 To decode the chunk, the initial predictor (for mono data) or both initial predictors (for stereo data) are placed directly into the output stream as the first PCM sample(s). Then, iterate through each remaining nibble in the chunk and decode the high nibble first (bits 7-4), then the low nibble, applying the the usual IMA ADPCM decoding algorithm. In the case of stereo data, the high nibble of a byte represents a left channel PCM sample and the low nibble represents the right channel PCM sample. Duck DK3 ADPCM -------------- Some Sega Saturn game CDs contain AVI files which store audio using the Duck DK3 ADPCM algorithm. These AVI files report format 0x62 as their audio codec. DK3 ADPCM data can be decoded using the same tables as are used to decode IMA ADPCM data while using a slightly modified variant of the IMA ADPCM algorithm. The name DK3 apparently comes from the fact that 3 ADPCM nibbles decode to 4 16-bit PCM samples, in constrast to Duck's DK4 ADPCM algorithm, in which 4 ADPCM nibbles decode to 4 16-bit PCM samples. It's important to note that WAVE format 0x62 is not officially registered to the Duck Corporation. Depending on which version of the audio codec registry is examined, this format will appear as being registered to either Quanta Computer or VoxWare. All multi-byte values are encoded in little-endian format. The length of a single block of DK4 data is encoded in the nBlockAlign field of an AVI file's WAV header. The DK3 algorithm encodes a sum channel and a difference channel, rather than left and right channels, using the standard IMA ADPCM algorithm and tables. A block of DK3 has a 16-byte preamble with the following information: bytes 0-1 unknown bytes 2-3 sample rate bytes 4-9 unknown bytes 10-11 initial sum channel predictor bytes 12-13 initial diff channel predictor byte 14 initial sum channel index byte 15 initial diff channel index After processing the block preamble, a stream of DK3 data is decoded nibble by nibble, just like any ADPCM data. The low nibble is decoded first (bits 3-0), then the high nibble. When decoding the stream, it's useful to conceptualize it as a stream of nibbles: n0 n1 n2 n3 n4 n5 n6 n7 ... where the nibbles were arranged in the original bytestream as: n1n0 n3n2 n5n4 n7n6 ... Each set of 3 nibbles decodes to 4 16-bit PCM samples using this process (note that the diff value is initialized to the same value as the diff predictor): get next ADPCM nibble in stream update sum channel predictor and index using nibble get next ADPCM nibble in stream update diff channel predictor and index using nibble diff value = (diff value + diff predictor) / 2 next left channel PCM sample = sum channel + diff value next right channel PCM sample = sum channel - diff value get next ADPCM nibble in stream update sum channel predictor and index using nibble next left channel PCM sample = sum channel + diff value next right channel PCM sample = sum channel - diff value Westwood Studios ADPCM ---------------------- Many games published by Westwood Studios use VQA files to transport movie animations and AUD files to transport audio clips. Such titles include the Command & Conquer and Lands of Lore series. Westwood Studios multimedia files store audio using the standard IMA ADPCM algorithm. VQA is a tagged format with different chunks marked by fourccs. A 'SND2' chunk contains IMA ADPCM nibbles. There is no chunk preamble that specifies initial predictor and index. The predictor and index variables are initialized to 0 when file playback is started and maintained across chunks. This makes random seeking through Westwood Studios files quite difficult. If the audio is mono data, the low nibble is decoded first (bits 3-0) then the high nibble: byte 0 byte 1 byte 2 byte 3 L1 L0 L3 L2 L5 L4 L7 L6 If the audio is stereo data, left and right bytes are interleaved. Each byte represents 2 samples for either the left of channel: byte 0 byte 1 byte 2 byte 3 L1 L0 R1 R0 L3 L2 R3 R2 SMJPEG ADPCM ------------ SMJPEG stands for SDL Motion JPEG. It is an animation format used by Loki Games for porting computer games (and their full motion video) to Linux. SMJPEG is a chunked file format which uses FOURCCs to identify blocks in the file as well as audio and video codecs. The only known video FOURCC used is 'JFIF' for JPEG. The only known audio FOURCC is 'APCM' for ADPCM. The ADPCM algorithm is IMA ADPCM. Compressed audio data comes packaged in 'sndD' chunks. Each chunk is stamped with a millisecond presentation timestamp and a data length, which is usually 0x104 bytes. The first 4 bytes are the initial conditions for decoding the ADPCM block: bytes 0-1 initial predictor, big endian format byte 2 initial index bytes 3 unused The remainder of the data bytes in the chunk are ADPCM nibbles to be decoded with the standard ADPCM algorithm. The low nibble is decoded first (bits 3-0), then the high nibble. Note that the SMJPEG format description apparently supports stereo. No stereo samples have been encountered at the time of this writing. It is unknown how the format would store stereo data. Dialogic ADPCM -------------- Dialogic ADPCM is a variation of the standard IMA ADPCM algorithm that is optimized for monaural voice data. The encoder operates on 12-bit input samples and outputs 4-bit encoding for each sample. This yields a 3:1 compression ratio. Dialogic ADPCM data is transported in raw files bearing the extension VOX. For each byte in the files, the high nibble (bits 7-4) is decoded first, then the low nibble. The decoding algorithm is precisely the same as the standard IMA ADPCM algorithm with the following modifications: * A different, smaller step table is used (refer to Appendix A for the table). The table contains 49 values ranging from indices 0..48. * The predictor is always initialized to 0 at the start of decoding. * The index is always initialized to 16 at the start of decoding. * When the index is modified by the ADPCM delta, it should be clamped within the 0..48 range, rather than 0..88. * When the diff is applied to the predictor, it should be clamped within a signed 12-bit range (-2048..2047) rather than a signed 16-bit range. * The output samples are 12 bits in resolution and should be scaled as necessary. Other ADPCM Formats ------------------- There are still many more ADPCM variants not covered in this document. Some are similar to the variants already covered here, but with slightly different preamble formats. Some are quite different than anything covered here. Many video game consoles employ some form of ADPCM in their sound hardware in order to have a simple method to reduce audio size. The Nintendo GameCube, released in 2001, uses a type of ADPCM. Sony's Playstation 2 console uses a format referred to as VAG ADPCM. Yamaha is known to have its own ADPCM format. Yamaha has produced the audio hardware for at least 2 Sega video game consoles, Saturn and Dreamcast. It's plausible that both use the same ADPCM format. Appendix A: Tables ------------------ [step_table[] and index_table[] are from the ADPCM reference source] This is the index table: int index_table[16] = { -1, -1, -1, -1, 2, 4, 6, 8, -1, -1, -1, -1, 2, 4, 6, 8, }; This is the step table. Note that many programs use slight deviations from this table, but such deviations are negligible: int step_table[89] = { 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, 157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, 494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552, 1707, 1878, 2066, 2272, 2499, 2749, 3024, 3327, 3660, 4026, 4428, 4871, 5358, 5894, 6484, 7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899, 15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794, 32767 }; This is the modified IMA step table used in the Dialogic ADPCM algorithm. This table comes from the Dialogic ADPCM PDF document: int dialogic_ima_step[49] = { 16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, 157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, 494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552 }; [AdaptationTable[], AdaptCoeff1[], and AdaptCoeff2[] are from libsndfile] int AdaptationTable [] = { 230, 230, 230, 230, 307, 409, 512, 614, 768, 614, 512, 409, 307, 230, 230, 230 } ; int AdaptCoeff1 [] = { 256, 512, 0, 192, 240, 460, 392 } ; int AdaptCoeff2 [] = { 0, -256, 0, 64, 0, -208, -232 } ; References ---------- These are some of the sources examined during the creation of this document: ADPCM reference implementation ftp://ftp.cwi.nl/pub/audio/adpcm.tar.gz XAnim http://xanim.va.pubnix.com/home.html QuickTime4Linux http://heroinewarrior.com libsndfile http://www.zip.com.au/~erikd/libsndfile/ duck.exe Truemotion (and ADPCM) player http://www.din.or.jp/~ch3/index_e.html Apple Developer Connection Technical Note TN1081: Understanding The Differences Between Apple And Windows IMA-ADPCM Compressed Sound Files http://developer.apple.com/technotes/tn/tn1081.html Command & Conquer Technical Page (for Westwood Studios ADPCM) http://www.geocities.com/SiliconValley/8682/cnc.html SMJPEG Library http://www.lokigames.com/development/smjpeg.php3 Dialogic ADPCM Algorithm http://resource.intel.com/telecom/support/appnotes/adpcm.pdf Acknowledgements ---------------- Thanks to Keiki Satoh for finding the bug relating to outputting the initial samples in an MS ADPCM block. ChangeLog --------- v1.6: January 7, 2003 - added SMJPEG ADPCM - added Dialogic ADPCM - changed references of VQA ADPCM -> Westwood Studios ADPCM v1.5: December 29, 2002 - added VQA ADPCM v1.4: July 7, 2002 - fixed bug about initial samples in MS ADPCM v1.3: March 30, 2002 - added information about Microsoft's take on the IMA ADPCM algorithm and reworked the entire section on IMA ADPCM - added Apple tech note that does its best to describe the differences between the QT and MS flavors of MS ADPCM v1.2: February 24, 2002 - added DK3 ADPCM - added Duck names - added FOURCCs and format numbers - corrected block length information for DK4 - added information about stereo DK4 data - added section about other ADPCM formats that are not yet fully documented v1.1: January 1, 2002 - added format 0x61 ADPCM - fixed bug regarding diff calculation in IMA ADPCM v1.0: December 29, 2001 - initial release EOF