简体   繁体   中英

Why does malloc() call mmap() and brk() interchangeably?

I'm new to C and heap memory, still struggling to understand dynamic memory allocation.

I traced Linux system calls and found that if I use malloc to request a small amount of heap memory, then malloc calls brk internally.

But if I use malloc to request a very large amount of heap memory, then malloc calls mmap internally.

So there must be a big difference between brk and mmap , but theoretically we should be able to use brk to allocate heap memory regardless of the requested size. So why does malloc call mmap when allocating a large amount of memory?

so why malloc calls mmap when it comes to allocate a large size of memory?

The short answer is for improved efficiency on newer implementations of Linux, and the updated memory allocation algorithms that come with them. But keep in mind that this is a very implementation dependent topic, and the whys and wherefores would vary greatly for differing vintages and flavors of the specific Linux OS being discussed.

Here is fairly recent write-up regarding the low-level parts mmap() and brk() play in Linux memory allocation. And, a not so recent, but still relevant Linux Journal article that includes some content that is very on-point for the topic here, including this:

For very large requests, malloc() uses the mmap() system call to find addressable memory space. This process helps reduce the negative effects of memory fragmentation when large blocks of memory are freed but locked by smaller, more recently allocated blocks lying between them and the end of the allocated space. In this case, in fact, had the block been allocated with brk(), it would have remained unusable by the system even if the process freed it.
(emphasis mine)

Regarding brk() :
incidentally , "... mmap() didn't exist in the early versions of Unix. brk() was the only way to increase the size of the data segment of the process at that time. The first version of Unix with mmap() was SunOS in the mid 80's, the first open-source version was BSD-Reno in 1990. ". Since that time, modern implementation of memory allocation algorithms have been refactored with many improvements, greatly reducing the need for them to include using brk() .

mmap (when used with MAP_ANONYMOUS ) allocates a chunk of RAM that can be placed anywhere within the process's virtual address space, and that can be deallocated later (with munmap ) independently of all other allocations.

brk changes the ending address of a single, contiguous "arena" of virtual address space: if this address is increased it allocates more memory to the arena, and if it is decreased, it deallocates the memory at the end of the arena. Therefore, memory allocated with brk can only be released back to the operating system when a continuous range of addresses at the end of the arena is no longer needed by the process.

Using brk for small allocations, and mmap for big allocations, is a heuristic based on the assumption that small allocations are more likely to all have the same lifespan, whereas big allocations are more likely to have a lifespan that isn't correlated with any other allocations' lifespan. So, big allocations use the system primitive that lets them be deallocated independently from anything else, and small allocations use the primitive that doesn't.

This heuristic is not very reliable. The current generation of malloc implementations, if I remember correctly, has given up altogether on brk and uses mmap for everything. The malloc implementation I suspect you are looking at (the one in the GNU C Library, based on your tags) is very old and mainly continues to be used because nobody is brave enough to take the risk of swapping it out for something newer that will probably but not certainly be better.

brk() is a traditional way of allocating memory in UNIX -- it just expands the data area by a given amount. mmap() allows you to allocate independent regions of memory without being restricted to a single contiguous chunk of virtual address space.

malloc() uses the data space for "small" allocations and mmap() for "big" ones, for a number of reasons, including reducing memory fragmentation. It's just an implementation detail you shouldn't have to worry about.

Please check this question also.

Reducing fragmentation is commonly given as the reason why mmap is used for large allocations; see ryyker's answer for details. But I think that's not the real benefit nowadays; in practice there's still fragmentation even with mmap , just in a larger pool (the virtual address space, rather than the heap).

The big advantage of mmap is discardability.

When allocating memory with sbrk , if the memory is actually used (so that the kernel maps physical memory at some point), and then freed, the kernel itself can't know about that, unless the allocator also reduces the program break (which it can't if the freed block isn't the topmost previously-used block under the program break). The result is that the contents of that physical memory become “precious” as far as the kernel is concerned; if it ever needs to re-purpose that physical memory, it then has to ensure that it doesn't lose its contents. So it might end up swapping pages out (which is expensive) even though the owning process no longer cares about them.

When allocating memory with mmap , freeing the memory doesn't just return the block to a pool somewhere; the corresponding virtual memory allocation is returned to the kernel, and that tells the kernel that any corresponding physical memory, dirty or otherwise, is no longer needed. The kernel can then re-purpose that physical memory without worrying about its contents.

the key part of the reason I think, which I copied from the chat said by Peter

free() is a user-space function, not a system call. It either hands them back to the OS with munmap or brk, or keeps them dirty in user-space. If it doesn't make a system call, the OS must preserve the contents of those pages as part of the process state.

So when you use brk to increase your memory adress, when return back, you have to use the brk a negtive value, so brk only can return the most recently memory block you allocated, when you call malloc(huge), malloc(small), free(huge). the huge cannot be returned back to system, you can only maintain a list of fragmentation for this process, so the huge is actually hold by this process. this is the drawback of brk.

but the mmap and munmap can avoid this.

I want to emphasize another view point.

malloc is system function that allocate memory.

You do not really need to debug it, because in some implementations, it might give you memory from static "arena" (eg static char array).

In some other implementations it may just return null pointer.

If you want to see what mallow really do, I suggest you look at
http://gee.cs.oswego.edu/dl/html/malloc.html

Linux gcc malloc is based on this.

You can take a look at jemalloc too. It basically uses same brk and mmap, but organizes the data differently and usually is "better".

Happy researching.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM