6.4. Audio Queues

In addition to simple sounds, Audio Toolbox includes low-level functions for processing sound on a bit stream level. The framework includes many APIs that provide access to the raw data within audio files and many conversion tools. If you're writing games or other sound-intensive applications, you may need an audio queue to render a digital output stream or control stereo channel mixing. The queue, unlike audio service, works with streams of raw audio data rather than complete files.

Think of the audio queue as a conveyor belt full of boxes. On one end of the conveyor belt, boxes are filled with chunks of sound, and on the other end they are dumped into the iPhone's speakers. These boxes represent sound buffers that carry bits around, and the conveyor belt is the audio queue. The conveyor belt dumps your sound into the speakers and then circles back around to have the boxes refilled. It's your job as the programmer to define the size, type, and number of boxes, and write the software to fill the boxes with sound when needed.

The Audio Toolbox queue is strictly first-in-first-out; that is, the conveyor belt plays the samples in the order in which they are added.

Audio Toolbox's audio queue works like this:

  1. An audio queue is created and assigned properties that identify the type of sound that will be played (format, sample rate, etc.).

  2. Sound buffers are attached to the queue, which will contain the actual sound frames to be played. Think of a sound frame as a single box full of sound, whereas a sample is a single piece of digital sound within the frame.

  3. The developer supplies a callback function, which the audio queue calls every time a sound buffer has been exhausted. This refills the buffer with the latest sound frames from your application.

6.4.1. Audio Queue Structure

Because the Audio Toolbox framework uses low-level C interfaces, it has no concept of a class. There are many moving parts involved in setting up an audio queue, and to make our examples more understandable, all of the different variables used will be encapsulated into a single user-defined structure we call AQCallbackStruct:

typedef struct AQCallbackStruct {
    AudioQueueRef queue;
    UInt32 frameCount;
    AudioQueueBufferRef mBuffers[AUDIO_BUFFERS];
    AudioStreamBasicDescription mDataFormat;
} AQCallbackStruct;

The following components are grouped into this structure to service the audio framework:



AudioQueueRef queue

A pointer to the audio queue object your program will create.



Uint32 frameCount

The total number of samples to be copied per audio sync. This is largely up to the implementer.



AudioQueueBufferRef mBuffers

An array containing the total number of sound buffers that will be used. The proper number of elements will be discussed in the section Section 6.4.3.



AudioStreamBasicDescription mDataFormat

Information about the format of audio that will be played.

Before you can create the audio queue, you'll need to initialize a description of the audio stream:

AQCallbackStruct aqc;
aqc.mDataFormat.mSampleRate = 44100.0;
aqc.mDataFormat.mFormatID = kAudioFormatLinearPCM;
aqc.mDataFormat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger
    | kAudioFormatFlagIsPacked;
aqc.mDataFormat.mBytesPerPacket = 4;
aqc.mDataFormat.mFramesPerPacket = 1;
aqc.mDataFormat.mBytesPerFrame = 4;
aqc.mDataFormat.mChannelsPerFrame = 2;
aqc.mDataFormat.mBitsPerChannel = 16;
aqc.frameCount = 735;

In this example, we prepare a structure for 16-bit (two bytes per sample) stereo sound (two channels) with a sample rate of 44 Khz (44,100). Our output sample will be provided in the form of two 2-byte short integers, hence four total bytes per frame (two bytes each for the left and right channels).

The sample rate and frame size dictate how often the iPhone will ask for more sound. With a frequency of 44,100 samples per second, we can make our application sync the sound every 60th of a second by defining a frame size of 735 samples (44,100 / 60 = 735).

The format we'll be providing in this example is PCM (raw data), but the audio queue supports all of the audio formats supported by the iPhone. These include the following:

kAudioFormatLinearPCM
kAudioFormatAppleIMA4
kAudioFormatMPEG4AAC
kAudioFormatULaw
kAudioFormatALaw
kAudioFormatMPEGLayer3
kAudioFormatAppleLossless
kAudioFormatAMR

6.4.2. Provisioning Audio Output

Once you have defined the audio queue's properties, you can provision a new audio queue object. The AudioQueueNewOutput function is responsible for provisioning an output channel and attaching it to the audio queue. The prototype for this function follows:

AudioQueueNewOutput(
    const AudioStreamBasicDescription *inFormat,
    AudioQueueOutputCallback          inCallbackProc,
    void *                            inUserData,
    CFRunLoopRef                      inCallbackRunLoop,
    CFStringRef                       inCallbackRunLoopMode,
    UInt32                            inFlags,
    AudioQueueRef *                   outAQ);



inFormat

The pointer to a structure describing the audio format that will be played. We defined this structure earlier as a member of data type AudioStreamBa⁠sicDe⁠scrip⁠tion within our AQCallbackStruct structure.



inCallbackProc

The name of a callback function that will be called when the audio queue has an empty buffer that needs data.



inUserData

A pointer to data that the developer can optionally pass to the callback function. It will contain a pointer to the instance of the user-defined AQCallbackStruct structure, which should contain information about the audio queue as well as any information relevant to the application about the samples being played.



inCallbackRunLoopMode

Tells the audio queue how it should expect to loop the audio. NULL specifies that the callback function will be invoked whenever a sound buffer becomes exhausted. Additional modes are available to run the callback under other conditions.



inFlags

Not used; reserved.



outAO

When the AudioQueueNewOutput function returns, this pointer will be set to the newly created audio queue. The presence of this argument allows an error code to be used as the return value of the function.

An actual call to this function, using the audio queue structure created earlier, looks like the following:

AudioQueueNewOutput(&aqc.mDataFormat,
    AQBufferCallback,
    &aqc,
    NULL,
    kCFRunLoopCommonModes,
    0,
    &aqc.queue);

In this example, the name of our callback function is specified as AQBufferCallback. We will create this function in the next few sections. It is the function that will be responsible for taking sound output from your application and copying it to a sound buffer.

6.4.3. Sound Buffers

A sound buffer contains sound data in transit to the output device. Going back to our box-on-a-conveyor-belt concept, the buffer is the box that carries your sound to the speakers. If you don't have enough sound to fill the box, it ends up going to the speakers incomplete, which could lead to gaps in the audio. The more boxes you have, the more sound you can queue up in advance to avoid running out (or running slow). The downside is that it also takes longer for the sound at the speaker end to catch up to the sound coming from the application. This could be problematic if the character in your game jumps, but the user doesn't hear it until after he's landed.

When the sound is ready to start, your code will create sound buffers and prime them with the first frames of the your application's sound output. The minimum number of buffers needed to start playback on an Apple desktop is only one, but on the iPhone it is three. In applications that might cause high CPU usage, it may be appropriate to use even more buffers to prevent underruns. To prepare the buffers with the first frames of sound data, each buffer is presented to your callback function once, which will fill them with sound. This means by the time you prime the buffers, you'd better have some sound to fill them with:

#define AUDIO_BUFFERS 3

unsigned long bufferSize;

bufferSize = aqc.frameCount * aqc.mDataFormat.mBytesPerFrame;
for (i=0; i<AUDIO_BUFFERS; i++) {
    AudioQueueAllocateBuffer(aqc.queue,
        bufferSize, &aqc.mBuffers[i]);
    AQBufferCallback (&aqc, aqc.queue, aqc.mBuffers[i]);
}

When this code executes, the audio buffers are filled with the first frames of sound data from your application. The queue is now ready to be activated, which turns on the conveyor belt sending the sound buffers to the speakers. As this occurs, the buffers are emptied of their contents (no, memory isn't zeroed) and the boxes come back around the conveyor belt for a refill:

AudioQueueStart(aqc.queue, NULL);

Later on, when you're ready to turn off the sound queue, just use the AudioQueueDispose function and everything stops:

AudioQueueDispose(aqc.queue, true);

6.4.4. Callback Function

The audio queue is now running and your application's callback function will be asked to fill a new sound buffer with data every 60th of a second. What hasn't been explained yet is how this happens. After a buffer is emptied and is ready to be refilled, the audio queue invokes the callback function you specified in your call to AudioQueueNew⁠Output. This callback function is where the application does its work; it fills the box with raw data that carries your output sound to the speakers. The audio queue invokes this function each time a buffer needs to be refilled. When called, you'll fill the audio queue buffer that is passed in by copying the latest sound frame from your application—in our example, 735 samples:

static void AQBufferCallback(
    void *aqc,
    AudioQueueRef inQ,
    AudioQueueBufferRef outQB)
{

The callback structure created at the beginning, aqc, is passed as a user-defined argument, followed by pointers to the audio queue and the audio queue buffer to be filled:

AQCallbackStruct *inData = (AQCallbackStruct *)aqc;

As the AQCallbackStruct structure is considered user data, it's supplied to the callback function as a void pointer, and needs to be cast back to an AQCallbackStruct structure (in this example, named inData) before it can be accessed. This code grabs a pointer to the raw audio data inside the buffer so the application can write its sound into it:

short *CoreAudioBuffer = (short *) outOB->mAudioData;

The CoreAudioBuffer variable represents the space inside the sound buffer where your application's raw samples will be copied at every sync. Your application will need to maintain a type of "record needle" to keep track of which sound bytes have already been sent to the audio queue:

if (inData->frameCount > 0) {

The frameCount variable specifies the number of frames that the buffer is expecting to see. This should be equivalent to the frameCount value that you supplied in the AQCallbackStruct structure—in our example, 735:

outQB->mAudioDataByteSize = 4 * inData->frameCount;

This is where you tell the buffer exactly how much data it's going to get: a packing list for the box. The total output buffer size should be equivalent to the size of both stereo channels (two bytes per channel = four bytes) multiplied by the number of frames sent (735):

for(i = 0 ; i < inData->frameCount * 2; i += 2) {
            CoreAudioBuffer[i]   =  (  LEFT CHANNEL DATA );
            CoreAudioBuffer[i+1] =  ( RIGHT CHANNEL DATA );
        }

Here, the callback function steps through each output frame in the buffer and copies the data from what will be your application's outputted sound into CoreAudioBuffer. Because the left and right channels are interleaved, the loop will have to account for this by skipping in increments of two:

AudioQueueEnqueueBuffer(inQ, outQB, 0, NULL);
   } /* if (inData->frameCount > 0) */
} /* AQBufferCallback */

Finally, once the frame has been copied into the sound buffer, it's placed back onto the play queue.

6.4.5. Volume Control

Samples played through the audio queue track with the system volume, but you might choose to fine-tune the magnitude of your sound output.

In the callback function used in the previous section, we copied sound frames from the application's sound output into sound buffers whenever a sync occurred. By adjusting these values with a volume multiplier, you can effectively raise and lower the output level of your samples:

for(i=0; i<aqc->frameCount*2; i+=2) {
            if (aqc->playPtr > aqc->sampleLen || aqc->playPtr < 0)
                sample = 0;
            else
                sample = (aqc->pcmBuffer[aqc->playPtr]);
            coreAudioBuffer[i] =   sample * volumeMultiplier;
            coreAudioBuffer[i+1] = sample * volumeMultiplier;
            aqc->playPtr++;
        }

When the volume is factored in, the sample value is multiplied by the volume setting's value so that it is increased or decreased by the factor of the volume. If you wanted the maximum volume to be louder, set the volume multiplier to a value greater than 1.0. To decrease the volume, set the multiplier to a decimal number less than 1.0. Be careful not to overdrive your audio output, which could create distortion.

6.4.6. Example: PCM Player

This example uses good old-fashioned C and is run on the command line with a filename. It loads a raw PCM file and then plays it using the Audio Toolbox's audio queue. Because your application will likely be generating data internally rather than using a file, this example reads the file into a memory buffer first and then plays it from memory to illustrate the practical concept. Most applications can hook into this same architecture.

Because a raw PCM file doesn't contain any information about its frequency or frame size, this example will have to assume its own. We'll use a format for 16-bit 44Khz mono uncompressed PCM data. This is defined by the three definitions made at the top of the program:

#define BYTES_PER_SAMPLE 2

16-Bit = 2 Bytes:

#define SAMPLE_RATE 44100

44,100 samples per second = 44 Khz:

typedef unsigned short sampleFrame;

An unsigned short is equivalent to two bytes (per sample).

If you can't find a raw PCM file to run this example with, you can use a .wav file as long as it's encoded in 16-bit 44Khz raw PCM. Alternatively, you may adapt this example to use a different encoding by changing mFormatID within the audio queue structure. The example won't make any attempt to parse file headers of a .wav; it just assumes the data you're providing is raw, which is what a game or other type of application would provide. Wave file headers will be passed to the audio channel with the rest of the data, so you might hear a slight click or two of junk before the raw sound data inside the file is played.

Because Leopard also includes the Audio Toolbox framework, you can compile this example on the desktop as well as for iPhone:

$ gcc -o playpcm playpcm.c \
  -framework AudioToolbox -framework CoreAudio -framework CoreFoundation

Example 6-9 contains the code.

Example 6-9. Audio Toolbox example (playpcm.c)
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/stat.h>
#include <AudioToolbox/AudioQueue.h>

#define BYTES_PER_SAMPLE 2
#define SAMPLE_RATE 44100
typedef unsigned short sampleFrame;

#define FRAME_COUNT 735
#define AUDIO_BUFFERS 3

typedef struct AQCallbackStruct {
    AudioQueueRef queue;
    UInt32 frameCount;
    AudioQueueBufferRef mBuffers[AUDIO_BUFFERS];
    AudioStreamBasicDescription mDataFormat;
    UInt32 playPtr;
    UInt32 sampleLen;
    sampleFrame *pcmBuffer;
} AQCallbackStruct;

void *loadpcm(const char *filename, unsigned long *len);
int playbuffer(void *pcm, unsigned long len);
void AQBufferCallback(void *in, AudioQueueRef inQ, AudioQueueBufferRef outQB);

int main(int argc, char *argv[]) {
    char *filename;
    unsigned long len;
    void *pcmbuffer;
    int ret;

    if (argc < 2) {
        fprintf(stderr, "Syntax: %s [filename]\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    filename = argv[1];
    pcmbuffer = loadpcm(filename, &len);
    if (!pcmbuffer) {
        fprintf(stderr, "%s: %s\n", filename, strerror(errno));
        exit(EXIT_FAILURE);
    }

    ret = playbuffer(pcmbuffer, len);
    free(pcmbuffer);
    return ret;
}

void *loadpcm(const char *filename, unsigned long *len) {
    FILE *file;
    struct stat s;
    void *pcm;

    if (stat(filename, &s))
        return NULL;
    *len = s.st_size;
    pcm = (void *) malloc(s.st_size);
    if (!pcm)
        return NULL;
    file = fopen(filename, "rb");
    if (!file) {
        free(pcm);
        return NULL;
    }
    fread(pcm, s.st_size, 1, file);
    fclose(file);
    return pcm;
}

int playbuffer(void *pcmbuffer, unsigned long len) {
    AQCallbackStruct aqc;
    UInt32 err, bufferSize;
    int i;

    aqc.mDataFormat.mSampleRate = SAMPLE_RATE;
    aqc.mDataFormat.mFormatID = kAudioFormatLinearPCM;
    aqc.mDataFormat.mFormatFlags =
        kLinearPCMFormatFlagIsSignedInteger
        | kAudioFormatFlagIsPacked;
    aqc.mDataFormat.mBytesPerPacket = 4;
    aqc.mDataFormat.mFramesPerPacket = 1;
    aqc.mDataFormat.mBytesPerFrame = 4;
    aqc.mDataFormat.mChannelsPerFrame = 2;
    aqc.mDataFormat.mBitsPerChannel = 16;
    aqc.frameCount = FRAME_COUNT;
    aqc.sampleLen = len / BYTES_PER_SAMPLE;
    aqc.playPtr = 0;
    aqc.pcmBuffer = pcmbuffer;

    err = AudioQueueNewOutput(&aqc.mDataFormat,
        AQBufferCallback,
        &aqc,
        NULL,
        kCFRunLoopCommonModes,
        0,
        &aqc.queue);
    if (err)
        return err;

    aqc.frameCount = FRAME_COUNT;
    bufferSize = aqc.frameCount * aqc.mDataFormat.mBytesPerFrame;

    for (i=0; i<AUDIO_BUFFERS; i++) {
        err = AudioQueueAllocateBuffer(aqc.queue, bufferSize,
            &aqc.mBuffers[i]);
        if (err)
            return err;
        AQBufferCallback(&aqc, aqc.queue, aqc.mBuffers[i]);
    }

    err = AudioQueueStart(aqc.queue, NULL);
    if (err)
        return err;

    while(aqc.playPtr < aqc.sampleLen) { select(NULL, NULL, NULL, NULL, 1.0); }
    sleep(1);
    return 0;
}

void AQBufferCallback(
    void *in,
    AudioQueueRef inQ,
    AudioQueueBufferRef outQB)
{
    AQCallbackStruct *aqc;
    short *coreAudioBuffer;
    short sample;
    int i;

    aqc = (AQCallbackStruct *) in;
    coreAudioBuffer = (short*) outQB->mAudioData;

    printf("Sync: %ld / %ld\n", aqc->playPtr, aqc->sampleLen);
    if (aqc->playPtr >= aqc->sampleLen) {
        AudioQueueDispose(aqc->queue, true);
        return;
    }

    if (aqc->frameCount > 0) {
        outQB->mAudioDataByteSize = 4 * aqc->frameCount;
        for(i=0; i<aqc->frameCount*2; i+=2) {
            if (aqc->playPtr > aqc->sampleLen || aqc->playPtr < 0)
                sample = 0;
            else
                sample = (aqc->pcmBuffer[aqc->playPtr]);
            coreAudioBuffer[i] =   sample;
            coreAudioBuffer[i+1] = sample;
            aqc->playPtr++;
        }
        AudioQueueEnqueueBuffer(inQ, outQB, 0, NULL);
    }
}

                                          

6.4.7. What's Going On

Here's how the playpcm program works:

  1. The application's main function is invoked when executed, which extracts the filename from the argument list (as supplied on the command line).

  2. The main function calls loadpcm, which determines the length of the audio file and loads it into memory, returning this buffer to main.

  3. The playbuffer function is called with the contents of this memory and its length. This function builds our user-defined AQCallbackStruct structure, whose construction is declared at the beginning of the program. This structure holds pointers to the audio queue, sound buffers, and the memory containing the contents of the file that was loaded. It also contains the sample's length and an integer called playPtr, which acts as record needle, identifying the last sample that was copied into the sound buffer.

  4. A new sound queue is initialized and started. The callback function is called once for each sound buffer, and is used to sync the first samples into memory. The audio queue is then started. The program then sits and sleeps until the sample is finished playing.

  5. As audio is played, the sound buffers become exhausted one by one. Whenever a buffer needs more sound data, the AQBufferCallback function is called.

  6. The AQBufferCallback function increments playPtr and copies the next sound frames from memory to be played into the sound buffer. Because raw PCM samples are mono, the same data is copied into both left and right output channels.

  7. When playPtr exceeds the length of the sound sample, this breaks the wait loop set up in playpcm, causing the function to return back to main for cleanup and exit.

6.4.8. Further Study

  • Modify this example to play 8-bit PCM sound by changing the data type for sampleFrame and BYTES_PER_SAMPLE. You'll also need to amplify the volume, as the sound sample is now one byte in size, but the audio queue channel is two bytes in size.

  • Check out AudioQueue.h in Mac OS X Leopard on the desktop. You can find these in /System/Library/Frameworks/AudioToolbox.framework/Headers/.