简体   繁体   English

如何为linux上的clone()系统调用mmap栈?

[英]How to mmap the stack for the clone() system call on linux?

The clone() system call on Linux takes a parameter pointing to the stack for the new created thread to use. Linux上的clone()系统调用接受一个指向堆栈的参数,以供新创建的线程使用。 The obvious way to do this is to simply malloc some space and pass that, but then you have to be sure you've malloc'd as much stack space as that thread will ever use (hard to predict). 显而易见的方法是简单地malloc一些空间并传递它,但是你必须确保你已经使用了大量的堆栈空间,因为该线程将使用(很难预测)。

I remembered that when using pthreads I didn't have to do this, so I was curious what it did instead. 我记得在使用pthreads时我不必这样做,所以我很好奇它做了什么。 I came across this site which explains, "The best solution, used by the Linux pthreads implementation, is to use mmap to allocate memory, with flags specifying a region of memory which is allocated as it is used. This way, memory is allocated for the stack as it is needed, and a segmentation violation will occur if the system is unable to allocate additional memory." 我遇到了这个网站 ,它解释说,“Linux pthreads实现使用的最佳解决方案是使用mmap来分配内存,标志指定在使用时分配的内存区域。这样,内存分配给根据需要使用堆栈,如果系统无法分配额外的内存,则会发生分段违规。“

The only context I've ever heard mmap used in is for mapping files into memory, and indeed reading the mmap man page it takes a file descriptor. 我曾经听过mmap使用的唯一上下文是将文件映射到内存,实际上读取mmap手册页需要一个文件描述符。 How can this be used for allocating a stack of dynamic length to give to clone()? 如何使用它来分配一堆动态长度来给clone()? Is that site just crazy? 这个网站真的很疯狂吗? ;) ;)

In either case, doesn't the kernel need to know how to find a free bunch of memory for a new stack anyway, since that's something it has to do all the time as the user launches new processes? 在任何一种情况下,内核都不需要知道如何为新堆栈找到一堆免费内存,因为这是用户启动新进程时必须始终做的事情吗? Why does a stack pointer even need to be specified in the first place if the kernel can already figure this out? 如果内核已经能够解决这个问题,为什么首先需要首先指定堆栈指针?

Stacks are not, and never can be, unlimited in their space for growth. 堆栈在其增长空间中不是,也绝不可能是无限的。 Like everything else, they live in the process's virtual address space, and the amount by which they can grow is always limited by the distance to the adjacent mapped memory region. 与其他所有内容一样,它们位于进程的虚拟地址空间中,并且它们可以增长的量总是受到到相邻映射内存区域的距离的限制。

When people talk about the stack growing dynamically, what they might mean is one of two things: 当人们谈论堆栈动态增长时,它们可能意味着两件事之一:

  • Pages of the stack might be copy-on-write zero pages, which do not get private copies made until the first write is performed. 堆栈的页面可能是写时复制零页面,在执行第一次写入之前不会获得私有副本。
  • Lower parts of the stack region may not yet be reserved (and thus not count towards the process's commit charge, ie the amount of physical memory/swap the kernel has accounted for as reserved for the process) until a guard page is hit, in which case the kernel commits more and moves the guard page, or kills the process if there is no memory left to commit. 堆栈区域的下半部分可能尚未保留(因此不计入进程的提交费用,即内核已为进程保留的物理内存/交换量),直到命中保护页面为止,其中如果没有内存提交,则内核提交更多内容并移动防护页面,或者终止进程。

Trying to rely on the MAP_GROWSDOWN flag is unreliable and dangerous because it cannot protect you against mmap creating a new mapping just adjacent to your stack, which will then get clobbered. 尝试依赖MAP_GROWSDOWN标志是不可靠和危险的,因为它无法保护您免受mmap创建紧邻您的堆栈的新映射,然后将被破坏。 (See http://lwn.net/Articles/294001/ ) For the main thread, the kernel automatically reserves the stack-size ulimit worth of address space (not memory ) below the stack and prevents mmap from allocating it. (参见http://lwn.net/Articles/294001/ )对于主线程,内核自动保留堆栈下方的堆栈大小ulimit值的地址空间 (不是内存 ),并阻止mmap分配它。 (But beware! Some broken vendor-patched kernels disable this behavior leading to random memory corruption!) For other threads, you simply must mmap the entire range of address space the thread might need for stack when creating it. (但要注意!有些破供应商修补内核禁用此行为导致随机内存损坏!)对于其他线程,你就必须 mmap的地址空间时创建它的线程可能需要为堆的整个范围。 There is no other way. 没有其他办法。 You could make most of it initially non-writable/non-readable, and change that on faults, but then you'd need signal handlers and this solution is not acceptable in a POSIX threads implementation because it would interfere with the application's signal handlers. 可以使它的大部分最初是不可写/不可读的,并在故障时改变它,但是你需要信号处理程序,这个解决方案在POSIX线程实现中是不可接受的,因为它会干扰应用程序的信号处理程序。 (Note that, as an extension, the kernel could offer special MAP_ flags to deliver a different signal instead of SIGSEGV on illegal access to the mapping, and then the threads implementation could catch and act on this signal. But Linux at present has no such feature.) (注意,作为扩展,内核可以提供特殊的MAP_标志,以便在非法访问映射时提供不同的信号而不是SIGSEGV ,然后线程实现可以捕获并对此信号起作用。但Linux目前还没有这样的特征。)

Finally, note that the clone syscall does not take a stack pointer argument because it does not need it. 最后,请注意, clone系统调用不会使用堆栈指针参数,因为它不需要它。 The syscall must be performed from assembly code, because the userspace wrapper is required to change the stack pointer in the "child" thread to point to the desired stack, and avoid writing anything to the parent's stack. 系统调用必须从汇编代码执行,因为用户空间包装器需要更改“子”线程中的堆栈指针以指向所需的堆栈,并避免向父堆栈写入任何内容。

Actually, clone does take a stack pointer argument, because it's unsafe to wait to change stack pointer in the "child" after returning to userspace. 实际上, clone确实采用了堆栈指针参数,因为在返回用户空间后等待更改“子”中的堆栈指针是不安全的。 Unless signals are all blocked, a signal handler could run immediately on the wrong stack, and on some architectures the stack pointer must be valid and point to an area safe to write at all times. 除非信号全部被阻塞,否则信号处理程序可以立即在错误的堆栈上运行,并且在某些体系结构上,堆栈指针必须有效并指向一直可以安全写入的区域。

Not only is modifying the stack pointer impossible from C, but you also couldn't avoid the possibility that the compiler would clobber the parent's stack after the syscall but before the stack pointer was changed. 不仅无法从C修改堆栈指针,而且还无法避免编译器在系统调用之后但在堆栈指针被更改之前破坏父堆栈的可能性。

You'd want the MAP_ANONYMOUS flag for mmap. 你想要mmap的MAP_ANONYMOUS标志。 And the MAP_GROWSDOWN since you want to make use it as a stack. 而MAP_GROWSDOWN因为你想把它当作一个堆栈使用它。

Something like: 就像是:

void *stack = mmap(NULL,initial_stacksize,PROT_WRITE|PROT_READ,MAP_PRIVATE|MAP_GROWSDOWN|MAP_ANONYMOUS,-1,0);

See the mmap man page for more info. 有关详细信息,请参见mmap手册页。 And remember, clone is a low level concept, that you're not meant to use unless you really need what it offers. 请记住,克隆是一个低级概念,除非你真的需要它提供的东西,否则你不打算使用它。 And it offers a lot of control - like setting it's own stack - just in case you want to do some trickering(like having the stack accessible in all the related processes). 它提供了很多控制 - 比如设置它自己的堆栈 - 以防你想要做一些欺骗(比如在所有相关进程中都可以访问堆栈)。 Unless you have very good reason to use clone, stick with fork or pthreads. 除非你有充分的理由使用clone,否则坚持使用fork或pthreads。

Joseph, in answer to your last question: 约瑟夫,回答你的上一个问题:

When a user creates a "normal" new process, that's done by fork(). 当用户创建一个“正常”的新进程时,这是由fork()完成的。 In this case, the kernel doesn't have to worry about creating a new stack at all, because the new process is a complete duplicate of the old one, right down to the stack. 在这种情况下,内核根本不必担心创建新堆栈,因为新进程是旧进程的完全副本,直到堆栈。

If the user replaces the currently running process using exec(), then the kernel does need to create a new stack - but in this case that's easy, because it gets to start from a blank slate. 如果用户使用exec()替换当前正在运行的进程,那么内核确实需要创建一个新的堆栈 - 但在这种情况下,这很容易,因为它可以从空白的平板开始。 exec() wipes out the memory space of the process and reinitialises it, so the kernel gets to say "after exec(), the stack always lives HERE". 擦除进程的内存空间并重新初始化它,因此内核会说“在exec()之后,堆栈总是在这里生存”。

If, however, we use clone(), then we can say that the new process will share a memory space with the old process (CLONE_VM). 但是,如果我们使用clone(),那么我们可以说新进程将与旧进程(CLONE_VM)共享一个内存空间。 In this situation, the kernel can't leave the stack as it was in the calling process (like fork() does), because then our two processes would be stomping on each other's stack. 在这种情况下,内核不能像调用进程那样离开堆栈(就像fork()那样),因为那时我们的两个进程就会在彼此的堆栈上踩踏。 The kernel also can't just put it in a default location (like exec()) does, because that location is already taken in this memory space. 内核也不能只将其置于默认位置(如exec()),因为该位置已经在此内存空间中占用。 The only solution is to allow the calling process to find a place for it, which is what it does. 唯一的解决方案是允许调用进程为它找到一个位置,这就是它的作用。

Here is the code, which mmaps a stack region and instructs the clone system call to use this region as the stack. 下面是代码,它将mmaps堆栈区域指示并指示克隆系统调用将此区域用作堆栈。

#include <sys/mman.h>
#include <stdio.h>
#include <string.h>
#include <sched.h>

int execute_clone(void *arg)
{
    printf("\nclone function Executed....Sleeping\n");
    fflush(stdout);
    return 0;
}

int main()
{
    void *ptr;
    int rc;
    void *start =(void *) 0x0000010000000000;
    size_t len = 0x0000000000200000;

    ptr = mmap(start, len, PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED|MAP_GROWSDOWN, 0, 0);
    if(ptr == (void *)-1) 
    {
        perror("\nmmap failed");
    }

    rc = clone(&execute_clone, ptr + len, CLONE_VM, NULL);

    if(rc <= 0) 
    {
        perror("\nClone() failed");
    }
}

mmap is more than just mapping a file into memory. mmap不仅仅是将文件映射到内存中。 In fact, some malloc implementations will use mmap for large allocations. 实际上,一些malloc实现将使用mmap进行大量分配。 If you read the fine man page you'll notice the MAP_ANONYMOUS flag, and you'll see that you need not need supply a file descriptor at all. 如果您阅读了精细手册页,您会注意到MAP_ANONYMOUS标志,您将看到根本不需要提供文件描述符。

As for why the kernel can't just "find a bunch of free memory", well if you want someone to do that work for you, either use fork instead, or use pthreads. 至于为什么内核不能只是“找到一堆可用内存”,如果你想让别人为你做这项工作,要么使用fork,要么使用pthreads。

Note that the clone system call doesn't take an argument for the stack location. 请注意, clone系统调用不会为堆栈位置采用参数。 It actually works just like fork . 它实际上就像fork一样。 It's just the glibc wrapper which takes that argument. 这只是glibc包装器,它接受了这个论点。

我认为堆栈向下增长直到它无法增长,例如当它增长到之前已分配的内存时,可能会通知故障。可以看出默认值是最小可用堆栈大小,如果有冗余空间当堆栈满时向下,它可以向下增长,否则,系统可能会通知故障。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Linux中的Clone()系统调用 - Clone() system call in linux linux 系统调用 mmap(2) 和 posix mmap(3) function 之间有什么区别? - What are the differences bettween linux system call mmap(2) and posix mmap(3) function? 如何在Linux系统上使用mmap()进行读写 - How to read and write with mmap() on a Linux system Linux克隆调用的最小堆栈大小? - Minimal stack size for Linux clone call? Linux内核中的clone()系统调用在哪里定义? - Where is the clone() system call define in the linux kernel? 参数如何传递给 Linux 系统调用? 通过寄存器还是堆栈? - How are parameters passed to Linux system call ? Via register or stack? 克隆系统调用的参数存储在堆栈中还是其他地方? - clone system call's argument stores in stack or somewhere else? 在 Linux 中,进程如何调用 mmap() 并为其子进程添加 VMA? - In Linux, how does a process call mmap() and add a VMA for its child? 如何识别linux系统中的性能瓶颈调用unshare(CLONE_NEWNET) - How to identify performance bottleneck in linux system call unshare(CLONE_NEWNET) 如何在arm linux系统调用中的vector_swi()中​​使用的堆栈指针初始化? - How is the stack pointer used in vector_swi() in arm linux system call initialised?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM