简体   繁体   中英

Correct way to profile a memory allocator

I have written a memory allocator that is (supposedly) faster than using malloc/free. I have written a small amout of code to test this but I'm not sure if this is the correct way to profile a memory allocator, can anyone give me some advice?

The output of this code is:

Mem_Alloc: 0.020000s
malloc: 3.869000s
difference: 3.849000s
Mem_Alloc is 193.449997 times faster.

This is the code:

int i;
int mem_alloc_time, malloc_time;
float mem_alloc_time_float, malloc_time_float, times_faster;
unsigned prev;

// Test Mem_Alloc
timeBeginPeriod (1);
mem_alloc_time = timeGetTime ();

for (i = 0; i < 100000; i++) {
    void *p = Mem_Alloc (100000);
    Mem_Free (p);
}

// Get the duration
mem_alloc_time = timeGetTime () - mem_alloc_time;

// Test malloc
prev = mem_alloc_time; // For getting the difference between the two times
malloc_time = timeGetTime ();

for (i = 0; i < 100000; i++) {
    void *p = malloc (100000);
    free (p);
}

// Get the duration
malloc_time = timeGetTime() - malloc_time;
timeEndPeriod (1);

// Convert both times to seconds
mem_alloc_time_float = (float)mem_alloc_time / 1000.0f;
malloc_time_float = (float)malloc_time / 1000.0f;

// Print the results
printf ("Mem_Alloc: %fs\n", mem_alloc_time_float);
printf ("malloc: %fs\n", malloc_time_float);

if (mem_alloc_time_float > malloc_time_float) {
    printf ("difference: %fs\n", mem_alloc_time_float - malloc_time_float);
} else {
    printf ("difference: %fs\n", malloc_time_float - mem_alloc_time_float);
}

times_faster = (float)max(mem_alloc_time_float, malloc_time_float) /
    (float)min(mem_alloc_time_float, malloc_time_float);
printf ("Mem_Alloc is %f times faster.\n", times_faster);

Nobody cares[*] whether your allocator is faster or slower than their allocator, at allocating and then immediately freeing a 100k block 100k times. That is not a common memory allocation pattern (and for any situation where it occurs, there are probably better ways to optimize than using your memory allocator. For example, use the stack via alloca or use a static array).

People care greatly whether or not your allocator will speed up their application.

Choose a real application. Study its performance at allocation-heavy tasks with the two different allocators, and compare that. Then study more allocation-heavy tasks.

Just for one example, you might compare the time to start up Firefox and load the StackOverflow front page. You could mock the network (or at least use a local HTTP proxy), to remove a lot of the random variation from the test. You could also use a profiler to see how much time is spent in malloc and hence whether the task is allocation-heavy or not, but beware that stuff like "overcommit" might mean that not all of the cost of memory allocation is paid in malloc .

If you wrote the allocator in order to speed up your own application, you should use your own application.

One thing to watch out for is that often what people want in an allocator is good behavior in the worst case. That is to say, it's all very well if your allocator is 99.5% faster than the default most of the time, but if it does comparatively badly when memory gets fragmented then you lose in the end, because Firefox runs for a couple of hours and then can't allocate memory any more and falls over. Then you realise why the default is taking so long over what appears to be a trivial task.

[*] This may seem harsh. Nobody cares whether it's harsh ;-)

All your implementation you are testing against is missing is checking if current size of packet is same as previously fried one:

if(size == prev_free->size) 
{
     current  = allocate(prev_free);
     return current; 
}

It is "trivial" to make efficient malloc/free functions for memory until memory is not fragmented. Challenge is when you allocate lot of memory of different sizes and you try to free some and then allocate some whit no specific order.

You have to check which library you tested against and check what conditions that library was optimised for.

  • de-fragmented memory handling efficiency
  • fast free, fast malloc (you can make either one O(1) ),
  • memory footprint
  • multiprocessor support
  • realloc

Check existing implementations and problems they were dealing whit and try to improve or solve difficulties they had. Try to figure out what users expects from library.

Make test on this assumptions, not just some operation you think is important.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM