sh_malloc - A sh** Memory Allocator in C
Now that I’ve finished my final year exams, and submitted my Master’s thesis, I have a bit of time on my hands that I’m not quite sure what to do with. I’ve been meaning to write a memory allocator for a while now, so I figure I might as well do it now. What follows is a “garage door open” view of me building a memory allocator in C; the allocator is dubbed sh_malloc for shit malloc — I don’t expect it to be very good.
Building Blocks
The most bare-bones version of sh_malloc allocates memory in two steps: first, we check the free-list for any free blocks of memory that fit the required allocation size. If none exist, we turn to the OS, requesting additional memory. This flow necessitates the use of a data structure to manage the free-list, where we opt for a linked list.
This list chains together blocks of unused memory by linking headers at the start of those regions. The header is simply a block of metadata located just before the block of memory, containing information about its size, free-ness, and a pointer to the next header.
struct header {
size_t size;
uint8_t is_free;
struct header *next;
arena_t *arena;
} __attribute__((aligned(16))); // Ensure header is aligned to 16-byte boundary, maximum primitive type size in C
typedef struct header header_t;
So, the allocation process involves moving through these headers to find a free block with a large enough size. This kind of structure means that we need only worry about managing these headers and, when it’s time to give memory to the user, we simply return a pointer to the memory block after the header:
return (void*)(hdr+1);
This is pretty cool, the memory itself is the free-list!
If we don’t find a suitable block, we need to request memory from the OS. In Linux, this involves an mmap
syscall. However, I’m using Windows, which presents the VirtualAlloc
function in its memory API.
void *block = VirtualAlloc(NULL, total_size, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
if(block == NULL) {
return NULL;
}
Here, we use MEM_RESERVE
and MEM_COMMIT
, which means Windows will both reserve and allocate the actual pages, which is of course what we want.
There is one crucial element missing here. We’re good programmers, so we make the habit of freeing any unused memory to prevent a memory leak. To abide by this, we need an implementation for sh_free. In its current form, it’s actually extremely simple: we just return the memory block to the free-list by toggling the is_free
flag.
header_t *hdr = (header_t *)ptr - 1;
hdr->is_free = 1;
Boundary Tagging
This works, but there are some clear performance issues. For one, it can be very wasteful to use an entire block of memory for a smaller allocation. Also, we can end up fragmenting our address space with many small blocks of memory, which can make it hard to find space for a large allocation later. We can improve on these by being more economical with the memory blocks we get from the OS.
First, VirtualAlloc
gives us memory in multiples of the page size (as does mmap
), so in not accounting for this, we’re wasting the entire rest of the page in perpetuity. It is, thus, much better to split the block so that the rest of the page gets added to the free-list. As a bonus, this also gives us natural arenas (the page itself) for each set of allocations to live in, which is useful for coalescing neighbouring blocks.
When it comes to freeing memory, we check if we can combine the newly freed block with either the next or the previous, if they are also free. Checking the next block is easy, we just jump forward by the size of the current block and check if that block is free. Looking at the previous block is a bit harder to do without going through the entire free-list, and checking every header. There’s a better way, and we’ve already been doing it.
The solution is to augment the current boundary tagging with an additional footer. That way, we can jump a few bytes backwards from the current block and figure out the size of the previous block, allowing us to jump back to its beginning. All of this allows for constant-time coalescing of free blocks.
// Coalesce with next block
header_t *next_block = (header_t *)((char *)hdr + sizeof(header_t) + hdr->size + sizeof(size_t));
if ((void *)next_block < (char *)hdr->arena->base + hdr->arena->size) {
if (next_block->is_free) {
hdr->size += sizeof(header_t) + next_block->size + sizeof(size_t);
hdr->next = next_block->next;
hdr->arena->num_blocks--;
write_footer(hdr);
}
}
// Coalesce with previous block
size_t *prev_size_ptr = (size_t *)((char *)hdr - sizeof(size_t));
if ((void *)prev_size_ptr > hdr->arena->base) {
size_t prev_size = *prev_size_ptr;
header_t *prev = (header_t *)((char *)hdr - sizeof(size_t) - prev_size - sizeof(header_t));
if ((void *)prev > hdr->arena->base && prev->is_free) {
prev->size += sizeof(header_t) + hdr->size + sizeof(size_t);
prev->next = hdr->next;
prev->arena->num_blocks--;
write_footer(prev);
}
}
Note that the arena region is used to bound the accesses of the next and previous block. This is needed since we can only guarantee that blocks generated in this arena are part of the free list, and memory beyond it may be unsafe to access. This does allow for some fragmentation across page boundaries.
More on Arenas
There’s a lot of talk here on arenas, but I haven’t really explained how they work in this implementation. The idea is simple: when we request memory from the OS, we do it in multiples of the page size, giving us a large memory region in which to do as many allocations as we can. This is an arena. If we need more space, we ask for it and get a new arena, but we hope most of our allocations will fit in an arena and fill it out, rather than fragmenting over multiple arenas.
Arenas are managed by a struct arena_t
:
typedef struct {
void *base;
size_t size;
int num_free_blocks;
int num_blocks;
} arena_t;
and you may have noticed that a pointer to this struct exists in the header_t
for each block of memory.
We want to track the same arena object for all blocks of memory in an arena, so we need some kind of global definition for it. To get around dynamically allocating it (feels like cheating to use malloc()
in a malloc implementation), we instead use a bit of space at the top of the arena to store the arena_t
object. Thus, as long as the arena exists, we can refer to its arena_t
, pretty cool!
This lets us do one more important thing: we can give memory back to the OS if the entire arena is free. This can be done at a more granular level, for each memory block, but we’ll likely spend too much time in syscalls, and also be giving up memory that we’ll likely need soon anyway. Giving up an entire arena when its free is a more balanced approach.