简体   繁体   English

带有私有匿名映射的ENOMEM的munmap()失败

[英]munmap() failure with ENOMEM with private anonymous mapping

I have recently discovered that Linux does not guarantee that memory allocated with mmap can be freed with munmap if this leads to situation when number of VMA (Virtual Memory Area) structures exceed vm.max_map_count . 我最近发现,如果这导致VMA(虚拟内存区域)结构数量超过vm.max_map_count情况,Linux不能保证使用munmap释放分配有mmap内存。 Manpage states this (almost) clearly: Manpage(几乎)清楚地说明了这一点:

 ENOMEM The process's maximum number of mappings would have been exceeded.
 This error can also occur for munmap(), when unmapping a region
 in the middle of an existing mapping, since this results in two
 smaller mappings on either side of the region being unmapped.

The problem is that Linux kernel always tries to merge VMA structures if possible, making munmap fail even for separately created mappings. 问题是Linux内核总是尝试合并VMA结构,即使对于单独创建的映射也会使munmap失败。 I was able to write a small program to confirm this behavior: 我能够编写一个小程序来确认这种行为:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

#include <sys/mman.h>

// value of vm.max_map_count
#define VM_MAX_MAP_COUNT        (65530)

// number of vma for the empty process linked against libc - /proc/<id>/maps
#define VMA_PREMAPPED           (15)

#define VMA_SIZE                (4096)
#define VMA_COUNT               ((VM_MAX_MAP_COUNT - VMA_PREMAPPED) * 2)

int main(void)
{
    static void *vma[VMA_COUNT];

    for (int i = 0; i < VMA_COUNT; i++) {
        vma[i] = mmap(0, VMA_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

        if (vma[i] == MAP_FAILED) {
            printf("mmap() failed at %d\n", i);
            return 1;
        }
    }

    for (int i = 0; i < VMA_COUNT; i += 2) {
        if (munmap(vma[i], VMA_SIZE) != 0) {
            printf("munmap() failed at %d (%p): %m\n", i, vma[i]);
        }
    }
}

It allocates a large number of pages (twice the default allowed maximum) using mmap , then munmap s every second page to create separate VMA structure for each remaining page. 它使用mmap分配大量页面(默认允许最大值的两倍),然后每隔一页使用munmap为每个剩余页面创建单独的VMA结构。 On my machine the last munmap call always fails with ENOMEM . 在我的机器上,最后一次munmap调用总是因ENOMEM失败。

Initially I thought that munmap never fails if used with the same values for address and size that were used to create mapping. 最初我认为如果使用与用于创建映射的地址和大小相同的值, munmap永远不会失败。 Apparently this is not the case on Linux and I was not able to find information about similar behavior on other systems. 显然在Linux上并非如此,我无法在其他系统上找到有关类似行为的信息。

At the same time in my opinion partial unmapping applied to the middle of a mapped region is expected to fail on any OS for every sane implementation, but I haven't found any documentation that says such failure is possible. 同时在我看来,应用于映射区域中间的部分取消映射预计会在任何操作系统上针对每个合理的实现失败,但我没有找到任何文档说这种失败是可能的。

I would usually consider this a bug in the kernel, but knowing how Linux deals with memory overcommit and OOM I am almost sure this is a "feature" that exists to improve performance and decrease memory consumption. 我通常认为这是内核中的一个错误,但知道Linux如何处理内存过量使用和OOM我几乎可以肯定这是一个“功能”,可以提高性能并减少内存消耗。

Other information I was able to find: 我能找到的其他信息:

  • Similar APIs on Windows do not have this "feature" due to their design (see MapViewOfFile , UnmapViewOfFile , VirtualAlloc , VirtualFree ) - they simply do not support partial unmapping. Windows上的类似API由于其设计而没有这个“功能”(参见MapViewOfFileUnmapViewOfFileVirtualAllocVirtualFree ) - 它们根本不支持部分取消UnmapViewOfFile
  • glibc malloc implementation does not create more than 65535 mappings, backing off to sbrk when this limit is reached: https://code.woboq.org/userspace/glibc/malloc/malloc.c.html . glibc malloc实现不会创建超过65535映射,在达到此限制时sbrksbrkhttps//code.woboq.org/userspace/glibc/malloc/malloc.c.html This looks like a workaround for this issue, but it is still possible to make free silently leak memory. 这看起来像这个问题的解决方法,但它仍然可以free默默泄漏内存。
  • jemalloc had trouble with this and tried to avoid using mmap / munmap because of this issue (I don't know how it ended for them). jemalloc遇到了麻烦,并试图避免使用mmap / munmap因为这个问题(我不知道它是如何结束的)。

Do other OS's really guarantee deallocation of memory mappings? 其他操作系统真的能保证内存映射的重新分配吗? I know Windows does this, but what about other Unix-like operating systems? 我知道Windows会这样做,但是其他类Unix操作系统呢? FreeBSD? FreeBSD的? QNX? QNX?


EDIT: I am adding example that shows how glibc's free can leak memory when internal munmap call fails with ENOMEM . 编辑:我正在添加一个示例,显示当内部munmap调用因ENOMEM失败时,glibc的free是如何泄漏内存的。 Use strace to see that munmap fails: 使用strace查看munmap失败:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

#include <sys/mman.h>

// value of vm.max_map_count
#define VM_MAX_MAP_COUNT        (65530)

#define VMA_MMAP_SIZE           (4096)
#define VMA_MMAP_COUNT          (VM_MAX_MAP_COUNT)

// glibc's malloc default mmap_threshold is 128 KiB
#define VMA_MALLOC_SIZE         (128 * 1024)
#define VMA_MALLOC_COUNT        (VM_MAX_MAP_COUNT)

int main(void)
{
    static void *mmap_vma[VMA_MMAP_COUNT];

    for (int i = 0; i < VMA_MMAP_COUNT; i++) {
        mmap_vma[i] = mmap(0, VMA_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

        if (mmap_vma[i] == MAP_FAILED) {
            printf("mmap() failed at %d\n", i);
            return 1;
        }
    }

    for (int i = 0; i < VMA_MMAP_COUNT; i += 2) {
        if (munmap(mmap_vma[i], VMA_MMAP_SIZE) != 0) {
            printf("munmap() failed at %d (%p): %m\n", i, mmap_vma[i]);
            return 1;
        }
    }

    static void *malloc_vma[VMA_MALLOC_COUNT];

    for (int i = 0; i < VMA_MALLOC_COUNT; i++) {
        malloc_vma[i] = malloc(VMA_MALLOC_SIZE);

        if (malloc_vma[i] == NULL) {
            printf("malloc() failed at %d\n", i);
            return 1;
        }
    }

    for (int i = 0; i < VMA_MALLOC_COUNT; i += 2) {
        free(malloc_vma[i]);
    }
}

One way to work around this problem on Linux is to mmap more that 1 page at once (eg 1 MB at a time), and also map a separator page after it. 在Linux上解决这个问题的一种方法是一次mmap多1页(例如一次1 MB),并在其后映射分隔页。 So, you actually call mmap on 257 pages of memory, then remap the last page with PROT_NONE , so that it cannot be accessed. 因此,您实际上在257页内存上调用mmap ,然后使用PROT_NONE重新映射最后一页,以便无法访问它。 This should defeat the VMA merging optimization in the kernel. 这应该会破坏内核中的VMA合并优化。 Since you are allocating many pages at once, you should not run into the max mapping limit. 由于您一次分配多个页面,因此不应该遇到最大映射限制。 The downside is you have to manually manage how you want to slice the large mmap . 缺点是你必须手动管理你想要切割大型mmap

As to your questions: 至于你的问题:

  1. System calls can fail on any system for a variety of reasons. 由于各种原因,系统调用可能在任何系统上失败。 Documentation is not always complete. 文档并不总是完整的。

  2. You are allowed to munmap a part of a mmap d region as long as the address passed in lies on a page boundary, and the length argument is rounded up to the next multiple of the page size. 只要传入的地址位于页面边界上,并且长度参数向上舍入到页面大小的下一个倍数,就可以对mmap d区域的一部分进行munmap

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM