简体繁体 English

C ++中多线程进程的内存布局

[英]memory layout of multithreaded process in C++

原文 2010-09-05 22:35:10 9 3 java/ c++

Am a bit confused on how stack and heap are arranged in multithreaded processes: 在多线程进程中如何安排堆栈和堆有点困惑：

Each thread has its own private stack. 每个线程都有自己的专用堆栈。
All threads share the heap 所有线程共享堆
When the program dynamically creates thread (ex: new Thread() in Java), the object is allocated on heap. 当程序动态创建线程时（例如Java中的new Thread（）），该对象将分配在堆上。

so does the heap contain memory for thread object, which means does heap contains stack (belonging to threads)? 那么堆中是否包含线程对象的内存，这意味着堆中是否包含堆栈（属于线程）？

3 个解决方案

Its delibrately vague as we don;t want to constrain the implementers of the threading software. 我们不希望它含糊其词；不想限制线程软件的实现者。

Each thread has its own private stack. 每个线程都有自己的专用堆栈。

As each thread executes a set of function independ from each other they need to store return addresses etc thus each needs its own stack. 当每个线程执行彼此独立的一组功能时，它们需要存储返回地址等，因此每个线程都需要自己的堆栈。

All threads share the heap 所有线程共享堆

That's the easiest way to implement it. 这是实现它的最简单方法。 This also means that all the threads share a common chunk of memory so that each thread can communicate with other threads simply by modifying memory. 这也意味着所有线程共享一个公共的内存块，以便每个线程可以简单地通过修改内存来与其他线程通信。

When the program dynamically creates thread (ex: new Thread() in Java), the object is allocated on heap. 当程序动态创建线程时（例如Java中的new Thread（）），该对象将分配在堆上。

The stack you mention in question 1. We need to reserve memory for it. 您在问题1中提到的堆栈。我们需要为其保留内存。 So we allocate a chunk of the heap give it to the thread and say use this chunk of memory to implement your stack. 因此，我们将堆的一部分分配给线程，并说使用这块内存来实现您的堆栈。 (Not saying that it does it this way but that is a simple technique for doing it). （并不是说它是这样做的，但这是一种简单的方法）。

so does the heap contain memory for thread object, which means does heap contains stack (belonging to threads) ? 那么堆中是否包含线程对象的内存，这意味着堆中是否包含堆栈（属于线程）？

In a single threaded program there is room to implement the stack as chunks of the heap. 在单线程程序中，有空间将堆栈实现为堆块。 The concept of stack and heap being separate and growing towards each other is just that; 堆栈和堆的概念是分开的，并且彼此接近。 a concept. 一个概念。 It is undefined how either are implemented and their is no reason that we can not implement the stack inside the heap. 还没有定义如何实现它们，也没有理由我们不能在堆内部实现堆栈。 See this question for more information: stack growth direction 有关更多信息，请参见此问题：堆栈增长方向

Think of 'the stack' as a data structure like any other. 将“堆栈”视为任何其他数据结构。 It could be implemented in any number of ways. 它可以以多种方式实现。

Here is a description of the typical implementation of the stack in C and C++ programs before 2000 or so. 这是2000年前左右在C和C ++程序中堆栈的典型实现的描述。 Most still do it this way: 大多数人还是这样做：

There is a contiguous range of memory addresses which are referred to as 'the stack'. 存在连续的内存地址范围，称为“堆栈”。 Frequently, on systems that had a memory controller (for Intel this means the 80386 and anything newer), the pages of this range of memory addresses are not assigned to physical memory until they are used. 通常，在具有内存控制器的系统上（对于Intel而言，这意味着80386及更高版本），直到使用了此内存地址范围的页面，才将它们分配给物理内存。 Typically this contiguous range of addresses occurred at the end of the address space. 通常，此连续的地址范围发生在地址空间的末尾。

There is a stack pointer that usually starts at the end of the memory region. 通常在内存区域的末尾有一个堆栈指针。 When a new stack frame is created the stack pointer is decreased by the size of the frame. 当创建一个新的堆栈框架时，堆栈指针会减小该框架的大小。 The CPU has instructions specifically designed for this operation. CPU具有专门为此操作设计的指令。 If a region of memory is accessed that has no physical memory of any kind assigned to it, the OS handles the page fault and finds some memory to assign to the now used page. 如果访问的内存区域没有分配任何类型的物理内存，则操作系统将处理页面错误并查找一些内存以分配给当前使用的页面。

All local variables and function parameters that are not passed in registers find their way into a stack frame. 所有未传递到寄存器中的局部变量和函数参数都会进入堆栈帧。

For multithreaded programs, this scheme doesn't work, so you typically allocate a region of memory using malloc or new and start a new thread with a call that takes a pointer to that region of memory and its size. 对于多线程程序，此方案不起作用，因此通常使用malloc或new分配内存区域，并通过调用开始一个新线程，该调用采用指向该内存区域及其大小的指针。 If the new thread needs more stack space than you've allocated all kinds of horrible things can occur, including the thread just stomping over some random memory that includes other variables allocated 'on the heap'. 如果新线程需要的堆栈空间比您分配的更多，那么可能会发生各种可怕的事情，包括线程只是踩踏一些随机内存，其中包括“在堆上”分配的其他变量。

But, that is by far not the only way to implement a stack. 但是，到目前为止，这并不是实现堆栈的唯一方法。 You could, for example, implement a stack as a linked list with each node of the list being a stack frame. 例如，您可以将堆栈实现为链接列表，列表的每个节点都是堆栈框架。 Languages that support a construct called ' continuations ' frequently do this. 支持称为“ continuation ”的构造的语言经常会这样做。 In fact, they usually use a DAG as a single stack frame may spawn multiple other stack frames that are all valid simultaneously. 实际上，它们通常使用DAG，因为单个堆栈帧可能会产生多个同时有效的其他堆栈帧。

Another thing that could be done is something halfway between in which your nodes are simply large regions of memory that each contain several stack frames. 可以做的另一件事是，在其中途中，您的节点只是大的内存区域，每个内存区域包含几个堆栈帧。 When a new frame is created that would overrun the node another node is allocated under the covers. 当创建一个新的框架时，该框架将超出节点的范围，另一个节点将被分配。

Or, all local variables could be allocated with new or something like that and just destroyed when they went out of scope. 或者，所有局部变量都可以分配新的变量或类似的变量，并在超出范围时销毁。 The compiler could make this happen behind the scenes. 编译器可以使这种情况在后台发生。

So, worrying about exactly where your stack is or how the memory is allocated underneath the hood, especially in a language like Java that doesn't even have pointers in the C or C++ sense, is kind of silly. 因此，担心堆栈的确切位置或内存在幕后的分配方式，尤其是像Java这样的语言甚至没有C或C ++的指针的语言，实在是很愚蠢的。 It might even vary between different fully compliant JVMs. 不同的完全兼容的JVM之间甚至可能有所不同。

I will say that generally pthreads in C++ implements the stack in the manner I describe for multithreaded programs in the last paragraph of the section in which I describe how C and C++ have historically worked. 我会说，一般而言，C ++中的pthreads以我在介绍C和C ++的历史工作方式的本节的最后一段中描述的多线程程序的方式来实现堆栈。 They usually also have a 'guard page', which a purposely unmapped page at the beginning of the region allocated for the stack so that programs that run out of stack space will usually SEGV. 它们通常还具有“保护页”，在分配给堆栈的区域的开头有意未映射的页面，这样用尽堆栈空间的程序通常将SEGV。 (Actually, this apparently is an oversimplification to the point of being wrong, see Ben Voigt's comment for the real use of the guard page). （实际上，这显然是对错误点的过分简化，请参阅Ben Voigt的评论以了解对保护页面的实际使用）。