Croaking Kero

Sound on Windows with WinMM in C

This tutorial is also available as a video. In this tutorial I’ll show you how to play realtime audio with Windows' native Multimedia library in C. Note: I’m compiling this code with GCC and not Microsoft’s C compiler. To compile with cl you’ll need to either add Window opening code to open a window and handle input that way, or connect the program to a console window which GCC does automatically. You'll also need to remove the PRINT_ERROR macro. Note: Click any of the hyperlinked words to visit the MSDN documentation page for them. main.c #define WIN32_LEAN_AND_MEAN #include <windows.h> #include <mmsystem.h> #include <conio.h> #include <stdint.h> #include <stdbool.h> #include <stdio.h> #include <math.h> #define TWOPI (M_PI + M_PI) #define PRINT_ERROR(a, args...) printf("ERROR %s() %s Line %d: " a, __FUNCTION__, __FILE__, __LINE__, ##args); HWAVEOUT wave_out; #define SAMPLING_RATE 44100 #define CHUNK_SIZE 2000 WAVEHDR header[2] = {0}; int16_t chunks[2][CHUNK_SIZE]; bool chunk_swap = false; float frequency = 400; float wave_position = 0; float wave_step; void CALLBACK WaveOutProc(HWAVEOUT, UINT, DWORD_PTR, DWORD_PTR, DWORD_PTR); int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd) { { WAVEFORMATEX format = { .wFormatTag = WAVE_FORMAT_PCM, .nChannels = 1, .nSamplesPerSec = SAMPLING_RATE, .wBitsPerSample = 16, .cbSize = 0, }; format.nBlockAlign = format.nChannels * format.wBitsPerSample / 8; format.nAvgBytesPerSec = format.nSamplesPerSec * format.nBlockAlign; if(waveOutOpen(&wave_out, WAVE_MAPPER, &format, (DWORD_PTR)WaveOutProc, (DWORD_PTR)NULL, CALLBACK_FUNCTION) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutOpen failed\n"); return -1; } } if(waveOutSetVolume(wave_out, 0xFFFFFFFF) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutGetVolume failed\n"); return -1; } wave_step = TWOPI / ((float)SAMPLING_RATE / frequency); for(int i = 0; i < 2; ++i) { for(int j = 0; j < CHUNK_SIZE; ++j) { chunks[i][j] = sin(wave_position) * 32767; wave_position += wave_step; } header[i].lpData = (CHAR*)chunks[i]; header[i].dwBufferLength = CHUNK_SIZE * 2; if(waveOutPrepareHeader(wave_out, &header[i], sizeof(header[i])) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutPrepareHeader[%d] failed\n", i); return -1; } if(waveOutWrite(wave_out, &header[i], sizeof(header[i])) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutWrite[%d] failed\n", i); return -1; } } static bool quit = false; while(!quit) { switch(_getche()) { case 72: { frequency += 50; wave_step = TWOPI / ((float)SAMPLING_RATE / frequency); printf("Frequency: %f\n", frequency); } break; case 80: { frequency -= 50; wave_step = TWOPI / ((float)SAMPLING_RATE / frequency); printf("Frequency: %f\n", frequency); } break; case 27: { quit = true; } break; } } return 0; } void CALLBACK WaveOutProc(HWAVEOUT wave_out_handle, UINT message, DWORD_PTR instance, DWORD_PTR param1, DWORD_PTR param2) { switch(message) { case WOM_CLOSE: printf("WOM_CLOSE\n"); break; case WOM_OPEN: printf("WOM_OPEN\n"); break; case WOM_DONE:{ printf("WOM_DONE\n"); for(int i = 0; i < CHUNK_SIZE; ++i) { chunks[chunk_swap][i] = sin(wave_position) * 32767; wave_position += wave_step; } if(waveOutWrite(wave_out, &header[chunk_swap], sizeof(header[chunk_swap])) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutWrite failed\n"); } chunk_swap = !chunk_swap; } break; } } build.bat gcc main.c -lwinmm

Code Walkthrough

#include <mmsystem.h> HWAVEOUT wave_out; #define SAMPLING_RATE 44100 #define CHUNK_SIZE 2000 WAVEHDR header[2] = {0}; int16_t chunks[2][CHUNK_SIZE]; bool chunk_swap = false; float frequency = 400; float wave_position = 0; float wave_step; Here are all the variables we need to use WinMM. I've #defined a couple of things for easy modification. frequency, wave_position and wave_step are all used to generate samples of a sine wave that is continuous across the two sound chunks. WAVEFORMATEX format = { .wFormatTag = WAVE_FORMAT_PCM, .nChannels = 1, .nSamplesPerSec = SAMPLING_RATE, .wBitsPerSample = 16, .cbSize = 0, }; format.nBlockAlign = format.nChannels * format.wBitsPerSample / 8; format.nAvgBytesPerSec = format.nSamplesPerSec * format.nBlockAlign; First we need to define our sound wave format with the WAVEFORMATEX structure. PCM stands for Pulse Code Modulation and means that each sample represents how far the speaker should be extended at each step. It’s the simplest, most direct and most common sound format. For now I’m using one sound channel. More channels would generally represent more speakers, so to play different sound data from two different speakers you’d use two channels. That would be standard stereo sound. The sample rate is how often we’ll update the speaker position per second; 44100Hz is the standard sampling rate for CD audio. We’re using 16 bits per sample which is also standard. cbSize lets us tell the structure how much extra space we’re using for extended format information; we’re using the standard PCM format so we don’t need any. We calculate nBlockAlign, which is how many bytes each sample requires, and the average bytes per second has a self-explanatory name. if(waveOutOpen(&wave_out, WAVE_MAPPER, &format, (DWORD_PTR)WaveOutProc, (DWORD_PTR)NULL, CALLBACK_FUNCTION) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutOpen failed\n"); return -1; } We pass waveOutOpen a pointer to our waveout handle to fill. The second argument is the ID of the sound device we want to use; WAVE_MAPPER selects the default device. We pass our wave format, a pointer to our waveout callback function, and a flag to tell it that we’re using a callback function. Using a callback function means that whenever something happens on the waveout device, our program will be interrupted by a call to the specified function to handle whatever happened, such as finishing playing a sound data chunk. There are 3 other options but a callback function is simple and effective. In a performance-intensive program, a separate thread can be used to handle audio without interrupting the program. if(waveOutSetVolume(wave_out, 0xFFFFFFFF) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutGetVolume failed\n"); return -1; } Setting the volume here sets the internal wave device volume. We want our audio to be sent along to the speaker driver without decreasing, then the user can adjust their speaker volume either through Windows audio controls or with their physical speaker controls. The volume value is 4 bytes, or 2 words. Each word sets the volume from 0->FFFF of the left or right channel, with the low word for left and high word for right. wave_step = TWOPI / ((float)SAMPLING_RATE / frequency); for(int i = 0; i < 2; ++i) { for(int j = 0; j < CHUNK_SIZE; ++j) { chunks[i][j] = sin(wave_position) * 32767; wave_position += wave_step; } We have to send audio to the wave device in chunks, so we create two chunks holding 2000 samples each. I’m using this wave_position, wave_step code to generate samples of a sine wave which will be continuous across the chunks. header[i].lpData = (CHAR*)chunks[i]; header[i].dwBufferLength = CHUNK_SIZE * 2; if(waveOutPrepareHeader(wave_out, &header[i], sizeof(header[i])) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutPrepareHeader[%d] failed\n", i); return -1; } if(waveOutWrite(wave_out, &header[i], sizeof(header[i])) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutWrite[%d] failed\n", i); return -1; } We use these wave headers to hold a pointer to the chunk data and length of the chunk, in bytes. Since each sample is two bytes we multiply the chunk size by two. We give the header to WinMM to be prepared for writing to the wave device using waveOutPrepareHeader, then pass the header to waveOutWrite. The first sound data sent is immediately played. We loop through the same code again, setting up and sending the second sound chunk. void CALLBACK WaveOutProc(HWAVEOUT wave_out_handle, UINT message, DWORD_PTR instance, DWORD_PTR param1, DWORD_PTR param2) { switch(message) { case WOM_CLOSE: printf("WOM_CLOSE\n"); break; case WOM_OPEN: printf("WOM_OPEN\n"); break; case WOM_DONE:{ printf("WOM_DONE\n"); for(int i = 0; i < CHUNK_SIZE; ++i) { chunks[chunk_swap][i] = sin(wave_position) * 32767; wave_position += wave_step; } if(waveOutWrite(wave_out, &header[chunk_swap], sizeof(header[chunk_swap])) != MMSYSERR_NOERROR) { PRINT_ERROR("waveOutWrite failed\n"); } chunk_swap = !chunk_swap; } break; } } Whenever a sound chunk has finished playing, WinMM will call our callback function with a WOM_DONE event. In that event we fill the completed chunk with new sound data and write it back to the wave device. This will continuously add a new chunk to the wave device whenever one finishes. static bool quit = false; while(!quit) { switch(_getche()) { case 72: { frequency += 50; wave_step = TWOPI / ((float)SAMPLING_RATE / frequency); printf("Frequency: %f\n", frequency); } break; case 80: { frequency -= 50; wave_step = TWOPI / ((float)SAMPLING_RATE / frequency); printf("Frequency: %f\n", frequency); } break; case 27: { quit = true; } break; } } In our main program loop we use _getche() to retrieve any button presses in the console window, and respond to the up and down arrow keys by adjusting the wave frequency. Escape causes the program to quit. This adjustable sine wave is just a simple way to demonstrate that the sound is working, and to test the latency. In my opinion the latency is low enough that it feels as if the sound changes immediately upon pressing the arrow key with these settings. Each chunk is 2000 samples which is about 45 milliseconds of audio in this format. That means our latency between something happening in our program and that being reflected in audio playback is 45 to 90ms, depending on whether that sound is put at the beginning or near the end of the new chunk. The chunks have to be big enough that WinMM is able to process and queue the new chunk of audio before it finishes playing the last one, or else we’ll hear a “pop” between each chunk as it runs out of data. This also depends on the performance of the user’s computer. I've found that 2000 samples is a safe amount on my low-end GPD Pocket2 laptop, regardless of sample rate. If you want lower latency you could decrease the chunk size, but you may get the aforementioned “pop” issue. You could also increase the sample frequency, for example to 96000Hz so that 2000 samples is only about 20ms. I’ve found that my current settings are a good enough baseline for good quality audio with unnoticeable latency but I recommend tuning these values depending on your application. As an aside, this is reminiscent of a batch script I wrote in my high school computer science class, which played an adjustable frequency sine wave noise. I adjusted the frequency high enough that all my classmates could hear it but my teacher couldn’t. All the students were complaining, and as you’ve probably noticed it’s very hard to identify the direction a high frequency noise is coming from, so the teacher thought everyone was just fooling around looking for an excuse not to do their work. I never got in trouble for it either, unlike my friend who made a program that deletes all your user files, called it "DO NOT RUN THIS.exe" with a nuclear symbol and copied it to everyone's desktops over the network. Good times. Anyway, that’s how to play realtime audio with Windows’ native Multimedia library in C. In my next tutorials I’ll show you how to load sound files, images, and do precise timing in C.
If you've got questions about any of the code feel free to e-mail me or comment on the youtube video. I'll try to answer them, or someone else might come along and help you out. If you've got any extra tips about how this code can be better or just more useful info about the code, let me know so I can update the tutorial. Thanks to Froggie717 for criticisms and correcting errors in this tutorial. Cheers.