The Skinny On ADPCM Data Formats
by Mike Melanson (melanson@pcisys.net)
v1.6: January 7, 2003

Contents
--------
 * Introduction
 * IMA ADPCM
 * DVI ADPCM
 * Microsoft ADPCM
 * Duck DK4 ADPCM
 * Duck DK3 ADPCM
 * Westwood Studios ADPCM
 * SMJPEG ADPCM
 * Dialogic ADPCM
 * Other ADPCM Formats
 * Appendix A: Tables
 * References
 * Acknowledgements
 * ChangeLog


Introduction
------------
ADPCM stands for Adaptive Differential Pulse Code Modulation. It's a
method for compressing audio samples at a constant 4:1 ratio. It only
operates on 16-bit PCM samples. It's fairly simple to implement and can be
performed with integer operations (although some special variations
employ floating point operations).

The goal of this document is to provide the specific algorithms and 
on-disk data formats necessary to implement various ADPCM decoders. A
discussion of the mathematical foundations of ADPCM is beyond the scope of
this document, mainly because the author doesn't understand the math
behind the algorithms.


IMA ADPCM
---------
IMA stands for the Interactive Multimedia Association. This group defined
a standard ADPCM algorithm for encoding and decoding 16-bit audio
samples. However, they did not specify how the coded ADPCM data was to be
stored on disk. It is important to note that Microsoft and Apple went
their separate ways on this matter.

In an Apple Quicktime MOV file, IMA ADPCM data is denoted by the codec
FOURCC "ima4". In a Microsoft media file (WAV, AVI, or ASF), this audio
format is denoted by format 0x11. The endian encoding of multi-byte
numbers will depend on the file type in which the media occurs.

IMA ADPCM data comes packaged in blocks of bytes. A block of data will
have a preamble consisting of an initial predictor and an initial step
index. These values serve as starter values for decoding a series of
nibbles. Following the preamble will be a number of bytes that will be
decoded on a nibble-by-nibble basis.

[See Appendix A for reference IMA ADPCM tables.]

The algorithm for decoding IMA ADPCM is as follows:
  initialize predictor variable from ADPCM block preamble
    sign extend predictor
    clamp predictor within a signed 16-bit range
  initialize step_index variable from ADPCM block preamble
    clamp step_index between 0 and 88
  initialize step as step_table[step_index]

After initialization, iterate through each nibble of each byte in the
remainder of the block following the preamble. Decode the lower nibble
first (bits 3-0), then the upper nibble.

For each nibble:
  step_index += index_table[(unsigned)nibble]
  clamp step_index between 0 and 88 (table limits)
  diff = ((signed)nibble + 0.5) * step / 4
  predictor += diff
  clamp predictor value within signed 16-bit range and output to
    decompressed audio stream
  step = step_table[step_index]

A note about the following calculation:
  diff = ((signed)nibble + 0.5) * step / 4
Many ADPCM codecs like to use this little code fragment that is close to
the real computation, but doesn't require any fractions, multiplications,
or divisions:
    sign = nibble & 8;
    delta = nibble & 7;
    diff = step >> 3;
    if (delta & 4) diff += step;
    if (delta & 2) diff += step >> 1;
    if (delta & 1) diff += step >> 2;
    if (sign) predictor -= diff;
    else predictor += diff;

As mentioned at the start, the IMA algorithm is the same between codec
format definitions, but the on-disk data format is different. In an Apple
Quicktime MOV file, each individual block of ADPCM data is 34 (0x22) bytes
in length and decodes to 64 16-bit signed PCM samples. The first 2 bytes
of the chunk comprise the preamble with the initial step index and
predictor values for the chunk. These 2 bytes are treated as a big-endian
value. Bits 15-7 are the top 9 bits of the 16-bit initial predictor
value. Bits 6-0 are the initial step index. The 16-bit preamble is laid
out as follows:

  pppppppp piiiiiii

The remaining 32 bytes, or 64 nibbles, in the block are decoded into 64
16-bit PCM samples.

For stereo data, a Quicktime file will store a block of left channel data
and right channel data interleaved with each other as follows:

  34-byte ADPCM block:
    2 bytes    initial predictor, index for left channel
    32 bytes   ADPCM codes for left channel
  34-byte ADPCM block:
    2 bytes    initial predictor, index for right channel
    32 bytes   ADPCM codes for right channel

A Microsoft media file (AVI/ASF/WAV) will contain a WAVEFORMAT structure
that contains, among other fields, nBlockAlign which reveals the size of
an IMA ADPCM block in a format 0x11 file. Each block contains 4 preamble
bytes for each channel. Each channel has the following preamble data:

  bytes 0-1     initial predictor (in little-endian format)
  byte 2        initial index
  byte 3        unknown, usually 0

Following the 4 or 8 preamble bytes (depending on number of channels) are
the ADPCM nibbles. If the data is stereo, the first 4 bytes comprise 8
left channel ADPCM nibbles followed by 4 bytes comprising 8 right channel
ADPCM nibbles. This interleaving continues until the end of the block.


DVI ADPCM
---------
Apparently, there are two variants of DVI ADPCM, the old variant and new
variant. The new variant is the same as the IMA ADPCM algorithm. The old
variant, however, decodes the upper nibble (bits 7-4) first, then the
lower nibble.


Microsoft ADPCM
---------------
Microsoft's variant of ADPCM is rather different than the IMA/DVI
variant.

All multi-byte numbers are stored in little endian format. In a Microsoft
media file (WAV, AVI, or ASF), this audio format is denoted by audio 
format 0x02.

The number of samples per ADPCM block is obtained through the
nBlockAlign field of a media file's WAV header.

ADPCM nibbles are decoded from the upper nibble (bits 7-4) first, then the
lower nibble.

A MS mono ADPCM block begins with the following preamble:

  byte 0      block predictor (should be [0..6])
  bytes 1-2   initial idelta
  bytes 3-4   sample 1
  bytes 5-6   sample 2

The initial idelta and both samples are signed numbers (so take sign
extension into account). The block predictor value is used as an index
into two adaptation coefficient tables in order to initialize two
coefficients, coeff1 and coeff2. 

The initial 2 samples from the preamble are sent directly to the
output. Sample 2 is first, then sample 1. The remaining samples are
decoded from the ADPCM nibbles, which comprise the rest of the bytes in
the block. For each nibble:
  predictor = ((sample1 * coeff1) + (sample2 * coeff2)) / 256
  predictor += (signed)nibble * idelta
  clamp predictor within signed 16-bit range
  PCM sample = predictor
  send PCM sample to the output
  shuffle samples:
    sample 2 = sample 1
    sample 1 = calculated PCM sample
  compute next adaptive scale factor:
    idelta = (AdaptationTable[nibble] * idelta) / 256
    clamp idelta to lower bound of 16

For stereo data, the preamble stores interleaved initialization values for
the left and right channels:

  byte 0      left channel block predictor (should be [0..6])
  byte 1      right channel block predictor (should be [0..6])
  bytes 2-3   left channel initial idelta
  bytes 4-5   right channel initial idelta
  bytes 6-7   left channel sample 1
  bytes 8-9   right channel sample 1
  bytes 10-11 left channel sample 2
  bytes 12-13 right channel sample 2

Following the preamble, the left and right ADPCM samples are interleaved
within each byte. The upper nibble (bits 7-4) contains the left channel
ADPCM code and the lower nibble contains the right channel ADPCM code.

See Appendix A for MS ADPCM reference tables.

Duck DK4 ADPCM
--------------
Some Sega Saturn game CDs contain AVI files which store audio using the
Duck DK4 ADPCM algorithm. These AVI files report format 0x61 as their
audio codec. DK4 data can be decoded using the same algorithm and tables 
as are used to decode IMA ADPCM data. The name apparently comes from the
fact that 4 ADPCM nibbles decode to 4 16-bit PCM samples, in constrast to
Duck's DK3 ADPCM algorithm, in which 3 ADPCM nibbles decode to 4 16-bit
PCM samples.

It's important to note that WAVE format 0x61 is not officially registered
to the Duck Corporation. Official registries of WAVE formats typically
list this number as being registered to ESS Technology.

All multi-byte values (actually, there's only 1, the initial predictor
value) are encoded in little-endian format. The length of a single block
of DK4 is encoded in the nBlockAlign field of an AVI file's WAV header.

A chunk begins with the following 4-byte preamble:

  bytes 0-1     initial predictor
  byte 2        initial index
  byte 3        unknown, usually 0

If the audio is mono, the preamble applies to the single audio channel. If
the audio is stereo, that preamble applies to the left channel and another
preamble for the right channel will follow in the next 4 bytes:

  bytes 4-5     initial predictor for right channel
  byte 6        initial index for right channel
  byte 7        unknown, usually 0

To decode the chunk, the initial predictor (for mono data) or both initial
predictors (for stereo data) are placed directly into the output stream as
the first PCM sample(s). Then, iterate through each remaining nibble in
the chunk and decode the high nibble first (bits 7-4), then the low
nibble, applying the the usual IMA ADPCM decoding algorithm. In the case
of stereo data, the high nibble of a byte represents a left channel PCM
sample and the low nibble represents the right channel PCM sample.


Duck DK3 ADPCM
--------------
Some Sega Saturn game CDs contain AVI files which store audio using the
Duck DK3 ADPCM algorithm. These AVI files report format 0x62 as their
audio codec. DK3 ADPCM data can be decoded using the same tables as are
used to decode IMA ADPCM data while using a slightly modified variant of
the IMA ADPCM algorithm. The name DK3 apparently comes from the fact that
3 ADPCM nibbles decode to 4 16-bit PCM samples, in constrast to Duck's DK4
ADPCM algorithm, in which 4 ADPCM nibbles decode to 4 16-bit PCM samples.

It's important to note that WAVE format 0x62 is not officially registered
to the Duck Corporation. Depending on which version of the audio codec
registry is examined, this format will appear as being registered to
either Quanta Computer or VoxWare.

All multi-byte values are encoded in little-endian format. The length of a
single block of DK4 data is encoded in the nBlockAlign field of an AVI
file's WAV header.

The DK3 algorithm encodes a sum channel and a difference channel, rather
than left and right channels, using the standard IMA ADPCM algorithm and
tables. A block of DK3 has a 16-byte preamble with the following
information:

  bytes 0-1    unknown
  bytes 2-3    sample rate
  bytes 4-9    unknown
  bytes 10-11  initial sum channel predictor
  bytes 12-13  initial diff channel predictor
  byte 14      initial sum channel index
  byte 15      initial diff channel index

After processing the block preamble, a stream of DK3 data is decoded
nibble by nibble, just like any ADPCM data. The low nibble is decoded
first (bits 3-0), then the high nibble. When decoding the stream, it's
useful to conceptualize it as a stream of nibbles:

  n0 n1 n2 n3 n4 n5 n6 n7 ...

where the nibbles were arranged in the original bytestream as:

  n1n0 n3n2 n5n4 n7n6 ...

Each set of 3 nibbles decodes to 4 16-bit PCM samples using this process
(note that the diff value is initialized to the same value as the diff
predictor):

  get next ADPCM nibble in stream
  update sum channel predictor and index using nibble

  get next ADPCM nibble in stream
  update diff channel predictor and index using nibble
  diff value = (diff value + diff predictor) / 2

  next left channel PCM sample = sum channel + diff value
  next right channel PCM sample = sum channel - diff value

  get next ADPCM nibble in stream
  update sum channel predictor and index using nibble

  next left channel PCM sample = sum channel + diff value
  next right channel PCM sample = sum channel - diff value


Westwood Studios ADPCM
----------------------
Many games published by Westwood Studios use VQA files to transport
movie animations and AUD files to transport audio clips. Such titles
include the Command & Conquer and Lands of Lore series. Westwood Studios
multimedia files store audio using the standard IMA ADPCM algorithm.

VQA is a tagged format with different chunks marked by fourccs. A 'SND2'
chunk contains IMA ADPCM nibbles. There is no chunk preamble that
specifies initial predictor and index. The predictor and index variables
are initialized to 0 when file playback is started and maintained across
chunks. This makes random seeking through Westwood Studios files quite
difficult.

If the audio is mono data, the low nibble is decoded first (bits 3-0)
then the high nibble:

 byte 0   byte 1   byte 2   byte 3
  L1 L0    L3 L2    L5 L4    L7 L6

If the audio is stereo data, left and right bytes are interleaved. Each
byte represents 2 samples for either the left of channel:

 byte 0   byte 1   byte 2   byte 3
  L1 L0    R1 R0    L3 L2    R3 R2


SMJPEG ADPCM
------------
SMJPEG stands for SDL Motion JPEG. It is an animation format used by
Loki Games for porting computer games (and their full motion video) to
Linux. SMJPEG is a chunked file format which uses FOURCCs to identify
blocks in the file as well as audio and video codecs. The only known
video FOURCC used is 'JFIF' for JPEG. The only known audio FOURCC is
'APCM' for ADPCM.

The ADPCM algorithm is IMA ADPCM. Compressed audio data comes packaged
in 'sndD' chunks. Each chunk is stamped with a millisecond presentation
timestamp and a data length, which is usually 0x104 bytes. The first 4
bytes are the initial conditions for decoding the ADPCM block:

  bytes 0-1   initial predictor, big endian format
  byte  2     initial index
  bytes 3     unused

The remainder of the data bytes in the chunk are ADPCM nibbles to be
decoded with the standard ADPCM algorithm. The low nibble is decoded
first (bits 3-0), then the high nibble.

Note that the SMJPEG format description apparently supports stereo. No
stereo samples have been encountered at the time of this writing. It is
unknown how the format would store stereo data.


Dialogic ADPCM
--------------
Dialogic ADPCM is a variation of the standard IMA ADPCM algorithm that
is optimized for monaural voice data. The encoder operates on 12-bit
input samples and outputs 4-bit encoding for each sample. This yields a
3:1 compression ratio.

Dialogic ADPCM data is transported in raw files bearing the extension
VOX. For each byte in the files, the high nibble (bits 7-4) is decoded
first, then the low nibble.

The decoding algorithm is precisely the same as the standard IMA ADPCM
algorithm with the following modifications:

  * A different, smaller step table is used (refer to Appendix A for the
    table). The table contains 49 values ranging from indices 0..48.
  * The predictor is always initialized to 0 at the start of decoding.
  * The index is always initialized to 16 at the start of decoding.
  * When the index is modified by the ADPCM delta, it should be clamped
    within the 0..48 range, rather than 0..88.
  * When the diff is applied to the predictor, it should be clamped
    within a signed 12-bit range (-2048..2047) rather than a signed
    16-bit range.
  * The output samples are 12 bits in resolution and should be scaled as
    necessary.


Other ADPCM Formats
-------------------
There are still many more ADPCM variants not covered in this
document. Some are similar to the variants already covered here, but with
slightly different preamble formats. Some are quite different than
anything covered here.

Many video game consoles employ some form of ADPCM in their sound hardware
in order to have a simple method to reduce audio size. 

The Nintendo GameCube, released in 2001, uses a type of ADPCM. Sony's
Playstation 2 console uses a format referred to as VAG ADPCM.

Yamaha is known to have its own ADPCM format. Yamaha has produced the
audio hardware for at least 2 Sega video game consoles, Saturn and
Dreamcast. It's plausible that both use the same ADPCM format.


Appendix A: Tables
------------------
[step_table[] and index_table[] are from the ADPCM reference source]
This is the index table:
int index_table[16] = {
    -1, -1, -1, -1, 2, 4, 6, 8,
    -1, -1, -1, -1, 2, 4, 6, 8,
};

This is the step table. Note that many programs use slight deviations from
this table, but such deviations are negligible:
int step_table[89] = {
    7, 8, 9, 10, 11, 12, 13, 14, 16, 17,
    19, 21, 23, 25, 28, 31, 34, 37, 41, 45,
    50, 55, 60, 66, 73, 80, 88, 97, 107, 118,
    130, 143, 157, 173, 190, 209, 230, 253, 279, 307,
    337, 371, 408, 449, 494, 544, 598, 658, 724, 796,
    876, 963, 1060, 1166, 1282, 1411, 1552, 1707, 1878, 2066,
    2272, 2499, 2749, 3024, 3327, 3660, 4026, 4428, 4871, 5358,
    5894, 6484, 7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899,
    15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794, 32767
};

This is the modified IMA step table used in the Dialogic ADPCM
algorithm. This table comes from the Dialogic ADPCM PDF document:
int dialogic_ima_step[49] = {
  16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45,
  50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143,
  157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449,
  494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552
};

[AdaptationTable[], AdaptCoeff1[], and AdaptCoeff2[] are from libsndfile]
int AdaptationTable []    =
{       230, 230, 230, 230, 307, 409, 512, 614,
        768, 614, 512, 409, 307, 230, 230, 230
} ;

int AdaptCoeff1 [] =
{       256, 512, 0, 192, 240, 460, 392
} ;

int AdaptCoeff2 [] =
{       0, -256, 0, 64, 0, -208, -232
} ;


References
----------
These are some of the sources examined during the creation of this
document:

ADPCM reference implementation
ftp://ftp.cwi.nl/pub/audio/adpcm.tar.gz

XAnim
http://xanim.va.pubnix.com/home.html

QuickTime4Linux
http://heroinewarrior.com

libsndfile
http://www.zip.com.au/~erikd/libsndfile/

duck.exe Truemotion (and ADPCM) player
http://www.din.or.jp/~ch3/index_e.html

Apple Developer Connection Technical Note TN1081: Understanding The
Differences Between Apple And Windows IMA-ADPCM Compressed Sound Files
http://developer.apple.com/technotes/tn/tn1081.html

Command & Conquer Technical Page (for Westwood Studios ADPCM)
http://www.geocities.com/SiliconValley/8682/cnc.html

SMJPEG Library
http://www.lokigames.com/development/smjpeg.php3

Dialogic ADPCM Algorithm
http://resource.intel.com/telecom/support/appnotes/adpcm.pdf


Acknowledgements
----------------
Thanks to Keiki Satoh <kki@wakusei.ne.jp> for finding the bug relating to
outputting the initial samples in an MS ADPCM block.


ChangeLog
---------
v1.6: January 7, 2003
 - added SMJPEG ADPCM
 - added Dialogic ADPCM
 - changed references of VQA ADPCM -> Westwood Studios ADPCM

v1.5: December 29, 2002
- added VQA ADPCM

v1.4: July 7, 2002
- fixed bug about initial samples in MS ADPCM

v1.3: March 30, 2002
- added information about Microsoft's take on the IMA ADPCM algorithm and
reworked the entire section on IMA ADPCM
- added Apple tech note that does its best to describe the differences
between the QT and MS flavors of MS ADPCM

v1.2: February 24, 2002
- added DK3 ADPCM
- added Duck names
- added FOURCCs and format numbers
- corrected block length information for DK4
- added information about stereo DK4 data
- added section about other ADPCM formats that are not yet fully
documented

v1.1: January 1, 2002
- added format 0x61 ADPCM
- fixed bug regarding diff calculation in IMA ADPCM

v1.0: December 29, 2001
- initial release

EOF