简体繁体 English

堆栈内存是否在Linux中是连续的？

[英]Is stack memory contiguous physically in Linux?

原文 2018-04-01 05:10:42 3 4 linux/ heap-memory/ virtual-memory/ cpu-cache/ stack-size

As far as I can see, stack memory is contiguous in virtual memory address, but stack memory is also contiguous physically? 据我所知，堆栈内存在虚拟内存地址中是连续的，但堆栈内存在物理上是连续的吗？ And does this have something to do with the stack size limit? 这是否与堆栈大小限制有关？

Edit: 编辑：

I used to believe that stack memory doesn't has to be contiguous physically, but why do we think that stack memory is always quicker than heap memory? 我曾经认为堆栈内存不必在物理上是连续的，但为什么我们认为堆栈内存总是比堆内存更快？ If it's not physically contiguous, how can stack take more advantage of cache? 如果它不是物理上连续的，那么堆栈如何更好地利用缓存呢？ And there is another thing that always confuse me, cpu executes directives in data segment, which is not near the stack segment in virtual memory, I don't think the operating system will make stack segment and data segment close to each other physically, so this might do harm to the cache effect, what do you think? 还有另一件事总是让我感到困惑，cpu在数据段中执行指令，它不在虚拟内存中的堆栈段附近，我不认为操作系统会使堆栈段和数据段在物理上彼此接近，所以这可能会对缓存效果造成伤害，你怎么看？

Edit again: Maybe I should give an example to express myself better, if we want to sort a large amount of numbers, using array to store the numbers is better than using a list, because every list node may be constructed by malloc , so it may not take good advantage of cache, that's why I say stack memory is quicker than heap memory. 再次编辑：也许我应该举一个例子来表达自己更好，如果我们想要对大量数字进行排序，使用数组来存储数字比使用列表更好，因为每个列表节点都可以由malloc构造，所以它可能没有充分利用缓存，这就是为什么我说堆栈内存比堆内存快。

4 个解决方案

As far as I can see, stack memory is contiguous in virtual memory address, but stack memory is also contiguous physically? 据我所知，堆栈内存在虚拟内存地址中是连续的，但堆栈内存在物理上是连续的吗？ And does this have something to do with the stack size limit? 这是否与堆栈大小限制有关？

No, stack memory is not necessarily contiguous in the physical address space. 不，堆栈存储器在物理地址空间中不一定是连续的。 It's not related to the stack size limit. 它与堆栈大小限制无关。 It's related to how the OS manages memory. 它与操作系统如何管理内存有关。 The OS only allocates a physical page when the corresponding virtual page is accessed for the first time (or for the first time since it got paged out to the disk). 操作系统仅在第一次访问相应的虚拟页面时（或者自从它被分页到磁盘后第一次）分配物理页面。 This is called demand-paging , and it helps conserve memory usage. 这称为请求分页，它有助于节省内存使用。

why do we think that stack memory is always quicker than heap memory? 为什么我们认为堆栈内存总是比堆内存快？ If it's not physically contiguous, how can stack take more advantage of cache? 如果它不是物理上连续的，那么堆栈如何更好地利用缓存呢？

It has nothing to do with the cache. 它与缓存无关。 It's just faster to allocate and deallocate memory from the stack than the heap. 从堆中分配和释放内存的速度比堆快。 That's because allocating and deallocating from the stack takes only a single instruction (incrementing or decrementing the stack pointer). 这是因为从堆栈分配和取消分配只需要一条指令（递增或递减堆栈指针）。 On the other hand, there is a lot more work involved into allocating and/or deallocating memory from the heap. 另一方面，从堆中分配和/或释放内存涉及更多工作。 See this article for more information. 有关更多信息，请参阅此文章。

Now once memory allocated (from the heap or stack), the time it takes to access that allocated memory region does not depend on whether it's stack or heap memory. 现在，一旦分配了内存（从堆或堆栈），访问分配的内存区域所花费的时间不依赖于它是堆栈还是堆内存。 It depends on the memory access behavior and whether it's friendly to the cache and memory architecture. 这取决于内存访问行为以及它是否对缓存和内存架构友好。

if we want to sort a large amount of numbers, using array to store the numbers is better than using a list, because every list node may be constructed by malloc, so it may not take good advantage of cache, that's why I say stack memory is quicker than heap memory. 如果我们想要对大量数字进行排序，使用数组来存储数字比使用列表更好，因为每个列表节点都可以由malloc构造，所以它可能不会很好地利用缓存，这就是我说堆栈内存的原因比堆内存快。

Using an array is faster not because arrays are allocated from the stack. 使用数组更快，不是因为数组是从堆栈中分配的。 Arrays can be allocated from any memory (stack, heap, or anywhere). 可以从任何内存（堆栈，堆或任何位置）分配数组。 It's faster because arrays are usually accessed contiguously one element at a time. 它更快，因为数组通常一次连续访问一个元素。 When the first element is accessed, a whole cache line that contains the element and other elements is fetched from memory to the L1 cache. 访问第一个元素时，将包含元素和其他元素的整个缓存行从内存中提取到L1缓存。 So accessing the other elements in that cache line can be done very efficiently, but accessing the first element in the cache line is still slow (unless the cache line was prefetched ). 因此，访问该缓存行中的其他元素可以非常有效地完成，但访问缓存行中的第一个元素仍然很慢（除非预取了缓存行）。 This is the key part: since cache lines are 64-byte aligned and both virtual and physical pages are 64-byte aligned as well, then it's guaranteed that any cache line fully resides within a single virtual page and a single physical page . 这是关键部分： 由于缓存行是64字节对齐的，虚拟和物理页面也是64字节对齐，因此可以保证任何缓存行完全驻留在单个虚拟页面和单个物理页面中 。 This what makes fetching cache lines efficient. 这使得获取缓存行变得高效。 Again, all of this has nothing to do with whether the array was allocated from the stack or heap. 同样，所有这些都与数组是从堆栈还是堆分配无关。 It holds true either way. 无论哪种方式都适用。

On the other hand, since the elements of a linked list are typically not contiguous (not even in the virtual address space), then a cache line that contains an element may not contain any other elements. 另一方面，由于链表的元素通常不是连续的（甚至在虚拟地址空间中也不连续），因此包含元素的高速缓存行可能不包含任何其他元素。 So fetching every single element can be more expensive. 因此，获取每个元素可能会更昂贵。

No, there is no promise of contiguity of physical addresses. 不，没有承诺物理地址的邻接。 But it doesn't matter, because user-space programs don't use physical addresses, so have no idea that this is the case. 但这没关系，因为用户空间程序不使用物理地址，所以不知道是这种情况。

Memory is memory. 记忆是记忆。 Stack memory is no faster than heap memory and is no slower. 堆栈内存不比堆内存快，并且速度不慢。 It is all the same. 一切都是一样的。 The only thing that makes a memory a stack or a heap is how it is allocated by the application. 使内存成为堆栈或堆的唯一方法是应用程序如何分配内存。 It is entirely possible to allocate a memory on the heap and make that the program stack. 完全可以在堆上分配内存并使其成为程序堆栈。

The speed difference is in the allocation. 速度差异在分配中。 Stack memory is allocated by subtracting from the stack pointer: one instruction. 通过从堆栈指针中减去一个指令来分配堆栈存储器。

The process of allocating heap depends upon the heap manager but it is much more complex and may requiring mapping pages to the address space. 分配堆的过程取决于堆管理器，但它要复杂得多，并且可能需要将页面映射到地址空间。

It is a complex topic. 这是一个复杂的话题。

Heap and stack have (usually) the same memory and memory type (MTRR, cache setting per page, etc.). 堆和堆栈（通常）具有相同的内存和内存类型（MTRR，每页缓存设置等）。 [mmap, files, drivers could have different strategies, or when user explicit change it]. [mmap，文件，驱动程序可能有不同的策略，或者当用户明确更改它时]。

Stack could be faster, because it is often used. 堆栈可能更快，因为它经常被使用。 When you call a function, parameters and local variables are put into stack, so the cache is fresh. 当您调用函数时，参数和局部变量将被放入堆栈，因此缓存是新鲜的。 Additionally, because functions call and return often, probably there is some more stack in the other cache level, and seldom the top of stack is paged (because it was used recently). 另外，因为函数经常调用和返回，所以可能在另一个缓存级别中有更多的堆栈，并且很少堆栈的顶部被分页（因为它最近被使用）。

So cache could be faster, but just if you have few variables. 因此缓存可能更快，但只是你的变量很少。 If you allow large arrays on stack eg with alloca , the advantage disappear. 如果允许堆栈上的大型数组，例如alloca ，则优势消失。

In general, this is a very complex topic, and it is better not to optimize too much, because it could cause complex code, so more difficult to refactor and high level optimization of code. 一般来说，这是一个非常复杂的主题，最好不要进行太多优化，因为它可能导致复杂的代码，因此更难以重构和高级代码优化。 (eg on multi-dimentional arrays, the order of indices (and so memory) and loops could improve sensible the speed, but also quickly the code will be impossible to maintain). （例如，在多维数组上，索引（以及存储器）和循环的顺序可以提高速度，但也很快就无法维护代码）。