Croaking Kero

Multi-Threaded Game Framework in C on Windows

This tutorial is also available as a video. In this tutorial I’ll show you how to create a multi-threaded game program framework with the Win32 native library. I've left out sound and other peripheral systems for now, but they'll be easy to add on once you understand the framework. Note: I’m compiling this code with GCC. Note: Click any of the hyperlinked words to visit the MSDN documentation page for them. Note: I've dimmed all of the code not specific to the subject of the tutorial. Note: The code listings are quite long so instead of a readable complete listing at the top of the HTML page, I've linked a download for the code files below. Only code snippets are shown throughout the tutorial and I encourage you to open up the code files on the side to follow through with the tutorial properly. multi-threaded

Code Walkthrough

In this multi-threaded game program framework, each major system gets its own thread. The main thread initiates the program, memory and other threads and goes on to handle operating system events. In separate threads we run the game update loop at a fixed rate of 120Hz, and render at the refresh rate of the player’s monitor, up to 120Hz. Sound and any other distinct systems will also get their own threads. The update thread communicates with the render thread by filling “render state” structures with all the information needed to render the latest game state, and the render thread reads the latest available state to render. I picked 120Hz for the update rate because it allows for a very fast response to input events, a small enough step to avoid common physics problems, a higher possible refresh rate for the display, and still lets the CPU sleep for a bit between updates to avoid melting. Rendering will be limited to 120Hz. It’s possible to go beyond that by interpolating between game states but that’s unnecessary complexity for this tutorial series. The program is organized into several files: main.c contains the entry-point, a bunch of data initialization, and program setup. It also has the Windows setup code and event handling. render.h and update.h are not traditional headers, but instead contain the entire code for each of those systems. They’re made as header files for easy inclusion and building. globals.h has types, a few enumerators and macros, and most of the libraries used by the program. There’s no sound.h yet. As a rule, the major systems – update, render, sound, whatever else comes along, are included by main.c. Libraries specific to one system are included in that file and all shared libraries are included by globals.h. Much of the code is normal stuff from my previous tutorials, so I’ll focus on the multi-threading code specific to this framework. I’ve also implemented a very simple and bad bouncing balls simulation just for demonstration. We need process.h to access Windows’ native threading library. static uintptr_t thread_render, thread_update; static render_data_t render_data = { .quit = &quit, .update_render_swap_state_atom = &update_render_swap_state_atom, .render_states = render_states, .window_handle = &window_handle, .frame = &frame, }; static update_data_t update_data = { .quit = &quit, .update_render_swap_state_atom = &update_render_swap_state_atom, .source_keyboard = keyboard, .render_states = render_states, }; Here we initialize the data for the render and update threads, including setting a bunch of pointers to shared variables in the main file. // Setup initial simulation state for(int i = 0; i < BALL_COUNT; ++i) { update_data.balls[i].x = randf_range(BALL_RADIUS, RESOLUTION_WIDTH - BALL_RADIUS); update_data.balls[i].y = randf_range(30, RESOLUTION_HEIGHT - BALL_RADIUS); update_data.balls[i].vx = randf_range(-0.5f, 0.5f); update_data.balls[i].vy = randf_range(-0.5f, 0.5f); } Before starting the threads, we loop through the balls in update_data to set their starting positions and velocities. // Start system threads if((thread_update = _beginthread(&Update, 0, (void*)&update_data)) == -1ul) { PRINT_ERROR("_beginthread(update) failed."); return -1; } if((thread_render = _beginthread(Render, 0, (void*)&render_data)) == -1ul) { PRINT_ERROR("_beginthread(render) failed."); return -1; } Next we start the update and render threads with _beginthread, passing a pointer to the function the thread should execute – the thread’s entry-point. We can set the thread’s stack size, or pass 0 to set it to the same size as the main thread’s. Last is a pointer to the arguments we wish to send along to the thread, so we send a pointer to the update or render data respectively. _beginthread returns the thread handle or -1 for an error. Since it’s an unsigned integer return type, -1 wraps around. static MSG message = { 0 }; while(GetMessage(&message, NULL, 0, 0) != 0 && !quit) { DispatchMessage(&message); } Main then enters the Windows message processing loop. Let’s look at globals.h before proceeding to update and render. First we have a few global definitions and standard libraries. The more interesting things are the data types. // Render-specific structures typedef struct { float x, y; } ball_render_t; typedef struct { bool busy; uint64_t update_count; ball_render_t balls[BALL_COUNT]; } render_state_t; typedef struct { bool *quit; _Atomic bool *update_render_swap_state_atom; render_state_t *render_states; HWND *window_handle; sprite_t *frame; uint32_t ball_color[BALL_COUNT]; } render_data_t; // Update-specific structures typedef struct { float x, y, vx, vy; } ball_update_t; typedef struct { bool *quit; _Atomic bool *update_render_swap_state_atom; render_state_t *render_states; bool *source_keyboard; ball_update_t balls[BALL_COUNT]; } update_data_t; For my crappy ball simulation, I’ve got two different ball structures. One holding all the information needed for the update thread, and the other for the render thread. This kind of data separation helps to keep structure sizes minimal per-thread and makes it easy to manage. We have the update and render data structures which share several pointers to main-thread variables, then have a couple of unique items. update_data_t has an array of ball_updates and render_data has ball_colours. The ball_render structures dwell in the render_states, since that information will be changed by each update whereas the colours are set once in the render thread. Each structure points to the render state array and the update_render_swap_state_atom, which is used to provide mutually exclusive access to the render states when each thread is selecting which state to use for its next iteration. void Update(void *data_void) { update_data_t *data = (update_data_t*)data_void; static LARGE_INTEGER tick_rate, ticks_last, ticks_now; static uint64_t ticks_per_frame; QueryPerformanceFrequency(&tick_rate); ticks_per_frame = tick_rate.QuadPart / 120; // Update rate = 120Hz QueryPerformanceCounter(&ticks_last); static uint64_t update_count = 0; static bool keyboard[256]; The update thread’s entry-point is this Update function. First we cast the void pointer to the actual update_data pointer we know it is, setup some timing variables, the update count and keyboard state array. while(!*data->quit) { // Capture current keyboard state memcpy(keyboard, data->source_keyboard, 256); /**************************************************** * Game logic ****************************************************/ Now Update enters its main loop. At the start of each update we capture the current state of the keyboard, then process the game logic – in this case a very bad ball simulation, the details of which aren’t important beyond that each ball moves and collides. Now for the juicy threading code! What we want to do is select the oldest render state not currently being used by the render thread, then fill it with the latest render state information. // Create a render state based on current game state { // Select oldest non-busy render state to replace static int found; static uint64_t lowest_update_count; found = -1; lowest_update_count = -1; while(!__sync_bool_compare_and_swap(data->update_render_swap_state_atom, false, true)); for(int i = 0; i < 3; ++i) { if(!data->render_states[i].busy && data->render_states[i].update_count < lowest_update_count) { found = i; lowest_update_count = data->render_states[i].update_count; } } First we use this while loop, calling an atomic compare and swap function. Atomic functions happen in a single CPU cycle and are guaranteed to maintain thread coherence. So we compare the swap atom’s value to false and replace it with true in a single CPU cycle. If it was already true it remains true and we try again. If it was false then it’s now true and we exit the loop. Once we’ve succeeded we can proceed to select a render state knowing that if the render thread catches up and tries to enter its own state selection, it’ll continuously fail the atomic function until we set the atom to false when we’re done. So we loop through all 3 render states, finding the ones that aren’t marked busy and selecting the one with the oldest update count. We set that render state to busy, then set the swap atom to false so that the render thread can select its render state when it wants. static render_state_t *render_state; render_state = &data->render_states[found]; render_state->busy = true; *data->update_render_swap_state_atom = false; render_state->update_count = ++update_count; for(int i = 0; i < BALL_COUNT; ++i) { render_state->balls[i].x = data->balls[i].x; render_state->balls[i].y = data->balls[i].y; } render_state->busy = false; } Now we set the render state’s update count, then all this render state’s data. In the ball simulation that’s just the positions of the balls, but in a proper game this would include all sorts of object states. When the render state has been properly set up, we set its busy variable to false to make it available to be selected by the render thread. // Sleep until next frame QueryPerformanceCounter(&ticks_now); static int32_t sleep_time; sleep_time = (1000.f * ((ticks_last.QuadPart + ticks_per_frame) - ticks_now.QuadPart) / tick_rate.QuadPart) - 2; printf("Update sleep: %dms\n", sleep_time); if(sleep_time > 0) Sleep(sleep_time); // Micro-sleep the remaining time QueryPerformanceCounter(&ticks_now); while(ticks_now.QuadPart - ticks_last.QuadPart < ticks_per_frame) { Sleep(0); QueryPerformanceCounter(&ticks_now); } ticks_last.QuadPart += ticks_per_frame; if(ticks_now.QuadPart - ticks_last.QuadPart > ticks_per_frame) ticks_last = ticks_now; Lastly the update thread performs some timing code to maintain a fixed update rate and allow the CPU to sleep between updates. That’s it for the update thread, now let’s look at the render thread. void Render(void *data_void) { render_data_t *data = (render_data_t*)data_void; static uint16_t screen_refresh; screen_refresh = GetDeviceCaps(CreateCompatibleDC(NULL), VREFRESH); printf("Screen refresh rate: %d\n", screen_refresh); static LARGE_INTEGER tick_rate, ticks_last, ticks_now; static uint64_t ticks_per_frame; QueryPerformanceFrequency(&tick_rate); ticks_per_frame = tick_rate.QuadPart / screen_refresh; // Update rate = screen refresh rate QueryPerformanceCounter(&ticks_last); for(int i = 0; i < BALL_COUNT; ++i) { data->ball_color[i] = rand32(); } Can you guess what function is the render thread’s entry-point? Anyway, we do much of the same stuff – cast the data and set up timing variables. We retrieve the screen’s refresh rate to use the for the render thread’s rate. We also set the ball colours randomly. while(!*data->quit) { // Wait until WM_PAINT message has been processed. Usually instant. UpdateWindow(*data->window_handle); /********************************************** * Stateless rendering **********************************************/ // Clear frame memset(data->frame->p, 0, data->frame->w*data->frame->h*4); In the render loop, I’ll get back to that first line in a minute. We could perform any rendering here that doesn’t depend on the game state, which for now only means clearing the frame. We use the same state selection code we saw in Update. /********************************************** * State-based rendering **********************************************/ // Draw balls for(int i = 0; i < BALL_COUNT; ++i) { DrawCircle(data->frame, (int)render_state->balls[i].x, (int)render_state->balls[i].y, BALL_RADIUS-1, data->ball_color[i]); } render_state->busy = false; Here we perform all drawing that depends on the render state data. For my simulation that’s just drawing all the balls. We’re done reading this render state now so we set it to not busy. InvalidateRect(*data->window_handle, NULL, FALSE); Next we use InvalidateRect to mark the entire window as needing redrawing and add a WM_PAINT message to the message queue, which will be handled in Main’s message processing. We use the same timing code as Update to Sleep away any remaining time. while(!*data->quit) { // Wait until WM_PAINT message has been processed. Usually instant. UpdateWindow(*data->window_handle); At the start of each loop we use UpdateWindow to wait until the WM_PAINT message has been processed. That is expected to happen during the Render thread’s sleeping code but this extra line prevents us from overwriting the frame buffer before WM_PAINT has copied it to the screen. Alternatively you could swap between two buffers instead of ever waiting. And that’s the whole framework! I think this is pretty solid but I welcome any feedback you may have. The code is available to download at the top of this tutorial, and if you use the code I’d love to hear about what you make. I’ve been getting some really great e-mails from my previous tutorials, and I hope that showing the development of a whole game will result in many of you making something cool. In my next few tutorials I’ll show you how I’m building the game itself, including player controls and physics, and pixel-perfect collisions.
If you've got questions about any of the code feel free to e-mail me or comment on the youtube video. I'll try to answer them, or someone else might come along and help you out. If you've got any extra tips about how this code can be better or just more useful info about the code, let me know so I can update the tutorial. Thanks to Froggie717 for criticisms and correcting errors in this tutorial. Cheers.