简体   繁体   English

C ++绿色线程的堆栈分配

[英]Stack allocation for C++ green threads

I'm doing some research in C++ green threads, mostly boost::coroutine2 and similar POSIX functions like makecontext()/swapcontext() , and planning to implement a C++ green thread library on top of boost::coroutine2 . 我正在做一些C ++绿色线程的研究,主要是boost::coroutine2和类似POSIX函数,如makecontext()/swapcontext() ,并计划在boost::coroutine2之上实现一个C ++绿色线程库。 Both require the user code to allocate a stack for every new function/coroutine. 两者都需要用户代码为每个新函数/协同程序分配一个堆栈。

My target platform is x64/Linux. 我的目标平台是x64 / Linux。 I want my green thread library to be suitable for general use, so the stacks should expand as required (a reasonable upper limit is fine, eg 10MB), it would be great if the stacks could shrink when too much memory is unused (not required). 我希望我的绿色线程库适合一般用途,因此堆栈应该根据需要进行扩展(合理的上限很好,例如10MB),如果堆栈在未使用太多内存时可能会收缩,那就太好了(不需要) )。 I haven't figured out an appropriate algorithm to allocate stacks. 我还没有想出一个合适的算法来分配堆栈。

After some googling, I figured out a few options myself: 经过一些谷歌搜索,我自己想出了几个选项:

  1. use split stack implemented by the compiler (gcc -fsplit-stack), but split stack has performance overhead. 使用由编译器实现的拆分堆栈(gcc -fsplit-stack),但拆分堆栈有性能开销。 Go has already moved away from split stack due to performance reasons. 由于性能原因,Go已经远离拆分堆栈。
  2. allocate a large chunk of memory with mmap() hope the kernel is smart enough to leave the physical memory unallocated and allocate only when the stacks are accessed. 使用mmap()分配大块内存希望内核足够智能,可以保留物理内存未分配,只在访问堆栈时分配。 In this case, we are at the mercy of the kernel. 在这种情况下,我们受内核的支配。
  3. reserve a large memory space with mmap(PROT_NONE) and setup a SIGSEGV signal handler. 使用mmap(PROT_NONE)保留大内存空间并设置SIGSEGV信号处理程序。 In the signal handler, when the SIGSEGV is caused by stack access (the accessed memory is inside the large memory space reserved), allocate needed memory with mmap(PROT_READ | PROT_WRITE) . 在信号处理程序中,当SIGSEGV由堆栈访问引起时(被访问的内存在保留的大内存空间内),使用mmap(PROT_READ | PROT_WRITE)分配所需的内存。 Here is the problem for this approach: mmap() isn't asynchronous safe, cannot be called inside a signal handler. 这是这种方法的问题: mmap()不是异步安全的,不能在信号处理程序内部调用。 It still can be implemented, very tricky though: create another thread during program startup for memory allocation, and use pipe() + read()/write() to send memory allocation information from the signal handler to the thread. 它仍然可以实现,但非常棘手 :在程序启动期间为内存分配创建另一个线程,并使用pipe() + read()/write()从信号处理程序向线程发送内存分配信息。

A few more questions about option 3: 关于选项3的更多问题:

  1. I'm not sure the performance overhead of this approach, how well/bad the kernel/CPU performs when the memory space is extremely fragmented due to thousands of mmap() call ? 我不确定这种方法的性能开销,当内存空间因为成千上万的mmap()调用而极度分散时,内核/ CPU的执行情况有多好/多差?
  2. Is this approach correct if the unallocated memory is accessed in kernel space ? 如果在内核空间中访问未分配的内存,这种方法是否正确? eg when read() is called ? 例如,当调用read()时?

Are there any other (better) options for stack allocation for green threads ? 绿色线程的堆栈分配还有其他(更好的)选项吗? How are green thread stacks allocated in other implementations, eg Go/Java ? 如何在其他实现中分配绿色线程堆栈,例如Go / Java?

The way that glibc allocates stacks for normal C programs is to mmap a region with the following mmap flag designed just for this purpose: glibc为普通C程序分配堆栈的方法是使用为此目的设计的以下mmap标志来映射区域:

   MAP_GROWSDOWN
          Used for stacks.  Indicates to the kernel virtual memory  system
          that the mapping should extend downward in memory.

For compatibility, you should probably use MAP_STACK too. 为了兼容性,您也应该使用MAP_STACK Then you don't have to write the SIGSEGV handler yourself, and the stack grows automatically. 然后您不必自己编写SIGSEGV处理程序,并且堆栈会自动增长。 The bounds can be set as described here What does "ulimit -s unlimited" do? 可以如此处所述设置边界“ulimit -s unlimited”有什么作用?

If you want a bounded stack size, which is normally what people do for signal handlers if they want to call sigaltstack(2) , just issue an ordinary mmap call. 如果你想要一个有限的堆栈大小,这通常是人们为信号处理程序做的事情,如果他们想要调用sigaltstack(2) ,只需发出一个普通的mmap调用。

The Linux kernel always maps physical pages that back virtual pages, catching the page fault when a page is first accessed (perhaps not in real-time kernels but certainly in all other configurations). Linux内核总是映射支持虚拟页面的物理页面,在首次访问页面时捕获页面错误(可能不在实时内核中,但肯定在所有其他配置中)。 You can use the /proc/<pid>/pagemap interface (or this tool I wrote https://github.com/dwks/pagemap ) to verify this if you are interested. 如果您感兴趣,可以使用/proc/<pid>/pagemap界面(或我编写的https://github.com/dwks/pagemap这个工具)进行验证。

Why mmap? 为何选择mmap? When you allocate with new (or malloc) the memory is untouched and definitely not mapped. 当您使用new(或malloc)进行分配时,内存不受影响且绝对不会映射。

const int STACK_SIZE = 10 * 1024*1024;
char*p = new char[STACK_SIZE*numThreads];

p now has enough memory for the threads you want. p现在有足够的内存用于你想要的线程。 When you need the memory, start accessing p + STACK_SIZE * i 当您需要内存时,开始访问p + STACK_SIZE * i

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM