« No naked assembly!Fast struct/class copy and exchange in C++ »

Fixed Size Memory Pool - 1 bit overhead!

Yesterday, I was working on a memory pool for fixed size allocations, which I needed for smaller data types, because the current memory allocator (I'm using nedmalloc) wastes a lot of memory for it's meta data and alignment padding (one allocated 32 bit integer requires 16 bytes of memory!).


The aim was not only to decrease amount of memory needed for dynamically allocated variables, but also to speed up the allocation/deallocation functions.


At the end of the day, I had it implemented, a templated memory pool class for any data type of any size, called Compact Element Storage (CES), which requires only 1 bit per element for it's meta data, and only 12 bytes per element block (element block consist of fixed size element array and meta data). The size of element block can be set through template parameters, and it's value is crucial, because it changes the performance of allocation and deallocation.


The larger the block size is, the better performance you get, because the less time is spent in block iteration while searching for free elements. However, the block size should also depend on how many elements you expect to allocate. For example, if it's expected to be less than 10 000 elements, the block size can be set to 1024 or less. If more elements are expected, e.g. 100 000, the block size of 4096 or 8196 is optimal.


In other words, using smaller block sizes while allocating hundred of thousands elements has bad impact on allocation/deallocation performace (at worst case, it's twice slower than new/delete).


Anyway, I did some benchmarking, where I tested performace of nedmalloc and CES. The benchmarking code has allocated and then deallocated, in reverse order, 100 000 of 32 bit integers (int32_t).


Results (allocation/deallocation time):
-nedmalloc: 206/229 ms
-CES: 79/75 ms

Results for 10 000 integers:
-nedmalloc: 36/25 ms
-CES: 8/7 ms

As you can see, the CES is more than twice faster than nedmalloc (for fixed sized allocations only). Also, it also requires a way less memory. Where nedmalloc requires at least (100 000 * 16) bytes for 100 000 integers, CES requires only ((100 000 * 4) + 12 500).


Still the nedmalloc is one of the fastest allocators for variable sized allocations available, so there is no need to write own allocator for it (nedmalloc would surely beat it).


Categories: Programming

No feedback yet