Memory Allocation

If you have not programmed embedded systems before, you may be surprised by some of the issues that come up when allocating memory on the Nintendo 64!

Background: Problems malloc/free must solve
The normal way to allocate memory in C on a modern system is to use malloc and free, provided by the standard C library. These functions are not magic! They use various data structures (some clever, some simple) to keep track of which ranges of memory are in use, and which blocks of memory are available.

Note: In C++, new/delete are similar to malloc/free, and std::unique_ptr / std::shared_ptr (by default) are wrappers around new/delete. So the same problems with malloc and free also apply to C++.

The main problem that malloc/free must face is that you can allocate memory and free memory however you like. This can cause heap fragmentation, which is when you have free memory, but the free memory is “fragmented” into many small chunks. If you need to allocate a larger chunk of memory, you can’t use the small chunks. If heap fragmentation gets bad enough, malloc/free will start failing, and your game won't be able to continue running.


 * Wikipedia: Fragmentation (computing)


 * Stack Overflow: What is memory fragmentation?

Is this a problem on modern systems, too? Yes, but modern systems have tons of memory (usually, gigabytes) and it’s easier to deal with fragmentation problems if you have a large amount of memory to play with. Modern systems will use a malloc/free implementation like dlmalloc, jemalloc, or tcmalloc.

Cache Tearing: Another Problem
The N64 CPU (the VR4300) has a data cache with 16-byte cache lines. That means that the CPU cannot write 1 byte, 4 bytes, or 8 bytes to RAM—it can only write in 16 byte chunks, aligned to 16-byte boundaries (if you are using normal, cached RAM).

This means that if the CPU and RSP write to the same cache line at the same time, one of them will win, and the other one will lose! For example, let’s say you need to load some data into RAM from the cartridge:

static bool is_loaded = false; static u8 my_data[256]; // Bad! Contains errors! void load_data(void) { if (!is_loaded) { dma_io_message_buffer = (OSIoMesg){ ... /* etc */ .dramAddr = &my_data, .devAddr = my_data_offset, .size = sizeof(my_data), };        osEPiStartDma(rom_handle, &dma_io_message_buffer, OS_READ); osRecvMesg(&dma_message_queue, NULL, OS_MESG_BLOCK); is_loaded = true; } }

What’s the problem with this code? The problem is that  may cross multiple cache lines, and there may be other data in those cache lines. For example, it’s possible that  (or anything else… it might be an unpleasant surprise in another file) is placed in the same cache line as , so when the CPU finally writes   to RAM, it also overwrites a chunk of   (which was written by the RCP).

The only reasonable solution is to make sure that memory (every 16-byte chunk) is only either used by the CPU or the RSP, but never both at the same time (unless both sides only reading). To do this, memory used by the RSP should usually be allocated with 16-byte alignment, and those 16-byte chunks should not be shared with anything else. This applies to data loaded from cartridge, buffers used by the RSP and RDP—anything that you might write from the RCP side.

This applies to everything—global variables, local variables, and data allocated with malloc/free.

Summary: Any 16-byte chunk of data should not be used by both the RCP and the CPU, or bad things will happen. Easy way to do this is to align your objects to 16-byte boundaries.

Global Variables
Use global variables whenever you need to allocate something with a fixed size and don't want to free it. This is the easiest way to allocate memory and should be used whenever it is an option.

For example, if you want to set aside 128 KB of RAM to use for a texture cache, you can do it very simply:

u8 my_texture_cache[128 * 1024] __attribute__((aligned(16))):

Obviously, you cannot easily free this memory and use it for something else (without using something like overlays). However, if you always want 128 KB of texture cache, and need to use it for your entire game, then this is a good way to do it.

Allocate Memory at Startup
Another easy solution is to allocate memory at startup, and never free it.

u8 *my_texture_cache; void init_textures(void) { my_texture_cache = malloc(128 * 1024); if (my_texture_cache == NULL) { abort; // Or however you want to handle errors. } }

This may be more convenient than using global variables, and it lets you allocate a variable amount of memory. For example, you may decide to allocate a larger framebuffer on PAL systems, or allocate larger caches on systems with the Expansion Pak (8 MB RAM).

Use Memory Pools
A “memory pool” sets aside a certain amount of memory, lets you allocate it in smaller chunks, and lets you free the entire pool all at once so it can be reused.

An example where this might be useful: if your game has different levels, you might decide to use a memory pool for the level-specific data. When you load a level, all data for the level gets placed into the memory pool. When you switch to a new level, the pool is reset and the pool can be reused for a different level. This can be very convenient if objects in your levels may be different sizes, but you don’t need to dynamically free objects during a level (because you can only free the entire pool all at once).

A very simple memory pool might look like this:

// A contiguous zone where memory can be allocated. struct mem_zone { uintptr_t pos;   // Pointer to current free space position. uintptr_t start; // Pointer to start of zone. uintptr_t end;   // Pointer to end of zone. const char *name; }; // Allocate a memory zone with the given size. void mem_zone_init(struct mem_zone *z, size_t size, const char *name) { void *ptr = malloc(size); if (ptr == NULL) { abort; // Put your error handling here. }    z->pos = (uintptr_t)ptr; z->start = (uintptr_t)ptr; z->end = (uintptr_t)ptr + size; }
 * 1) include 
 * 2) include 

Note: If you are using the old version of GCC (version 2.7.2) that ships with the Windows XP version of the tools, you will not have. You can work around this by defining  yourself:

// HACK: Works on N64! typedef unsigned long uintptr_t;

To allocate, you just bump the “pos” pointer up in the zone:

// Allocate memory from the zone. void *mem_zone_alloc(struct mem_zone *z, size_t size) { if (size == 0) { return NULL; }    // Round up to multiple of 16 bytes. size = (size + 15) & ~(size_t)15; // How much free space remaining in zone? size_t rem = z->end - z->pos; if (rem < size) { abort; // Out of memory. Put your error handling here. }    uintptr_t ptr = z->pos; z->pos = ptr + size; return (void *)ptr; }

You can free all the memory just by resetting the pointer:

// Free all objects in the zone. void mem_zone_free_all(struct mem_zone *z) { z->pos = z->start; }

Writing Your Own malloc
So, where does malloc get its memory from? You can actually write your own, very simple malloc if you like!

If  is the symbol for the end of the static memory that your program uses, then you can create a single memory zone for the remaining N64 RAM and use that to allocate your memory pools from. Using the memory pool code above, we create the “main memory” pool by using  and   (which is a LibUltra function).

extern u8 _bss_start; struct mem_zone main_memory; // Create a memory pool containing all RAM not used by code + global variables. void init_memory(void) { // Round _bss_start up to 16 byte boundary. uintptr_t start = ((uintptr_t)&_bss_start + 15) & ~(uintptr_t)15; main_memory.pos = start; main_memory.start = start; main_memory.end = 0x80000000 + osGetMemSize; }
 * 1) include 
 * 2) include 

Note: The  is part of LibUltra.

We can then just define  to use this pool:

void *malloc(size_t size) { return mem_zone_alloc(&main_memory, size); }
 * 1) include 

We then don’t define a  function. With this system, you can’t free memory allocated with.

Use Freelists (Object Pools)
A freelist is a simple way of reusing objects so you don’t have to call malloc or free.

With a freelist, you keep a linked list of objects which are not being used. A simple freelist might look like this:

// Monster data structure. struct monster { vec3 position; int health; struct monster *next_free; }; struct monster monster_array[MONSTER_COUNT]; struct monster *monster_free; // Pointer to first free monster.
 * 1) define MONSTER_COUNT 100

At startup, allocate memory for your objects and add them all to the freelist.

// Initialize monsters. Call once, at startup. void monster_init(void) { for (int i = 0; i < MONSTER_COUNT - 1; i++) { monster_array[i]->next_free = &monster_array[i + 1]; }    monster_array[MONSTER_COUNT - 1]->next_free = NULL; }

You can allocate by pulling an item from the freelist, and removing it from the freelist.

// Allocate a monster from the freelist. struct monster *monster_new(void) { union monster_entry *mon = monster_free; if (mon == NULL) { // Freelist is empty. return NULL; }    // Remove from freelist. monster->free = mon->next; return &mon->monster; }

To free an item, just put it back on the list.

// Return a monster to the freelist. void monster_free(struct monster *mon) { mon->next_free = monster_free; monster_free = mon; }

There are more clever ways to do this, but this is a start. Freelists and object pools are very common, even today, in modern games on modern hardware.