C在调用函数激活记录时究竟将其实际使用多少堆栈空间？

Question

Environment：gcc version 6.3.0 (MinGW.org GCC-6.3.0-1) on Windows10 环境：Windows10上的gcc版本6.3.0（MinGW.org GCC-6.3.0-1）

I compile and run code in command line. 我在命令行中编译并运行代码。

Here is my code: 这是我的代码：

#include <stdio.h>  
int func(void){
    int c;
    printf("stack top in func \t%p\n", &c);
    return 1;
}
void main(void)  { 
    int arr[0];
    int i;  
    printf("stack top before func \t%p\n", &i);
    i = func();
    int j;
    printf("stack top after func \t%p\n", &j);
    return;  
}

Here is result: 结果如下：

stack top before func   0061FF2C
stack top in func       0061FEFC
stack top after func    0061FF28

The gap size between the stack top while in function and stack top out of function is 48 bytes. 处于运行状态时的堆栈顶部与未达到功能的堆栈顶部之间的间隙大小为48个字节。

I then changed the size of "arr" to 1 and the result is: 然后，我将“ arr”的大小更改为1，结果是：

stack top before func   0061FF28
stack top in func       0061FEFC
stack top after func    0061FF24

The gap just shrinked and the stack top while in function stayed put.The gap size is now 44 bytes. 差距缩小了，而栈顶在使用时保持不变。差距大小现在是44个字节。

It stops shrinking when the size of "arr" is 3. 当“ arr”的大小为3时，它停止缩小。

The new gap size is 52 bytes. 新的间隙大小为52个字节。

Is that sort of strategy of memory management? 那是内存管理的策略吗？

What's the benefit when it can use 44 bytes while it chose to use 52 bytes and the size of variables before function call can be known while compile time? 当它可以选择选择使用52个字节并且可以在编译时知道函数调用的变量大小时，使用44个字节有什么好处？

Answer 1

I think you are making some unfounded assumptions on how the stack, and the compiler, work. 我认为您对堆栈和编译器的工作方式做出了毫无根据的假设。 Namely: 即：

that variables are allocated at the moment you declare them, 在声明变量时就分配了变量，
that the "last" variable takes up the "top" of the stack, “最后一个”变量占据了堆栈的“顶部”，
that the variables only take as much space as they need, 变量仅占用所需的空间，
that this has a clear and deterministic answer. 这有一个明确而确定的答案。

Here's a rough idea of what happens when you call a function in C, gcc, x86 platform, no optimizations: 关于在C，gcc，x86平台上调用函数（无优化）时发生的情况，这是一个大概的想法：

The parameters (if any) are stored in registers and/or the stack. 参数（如果有）存储在寄存器和/或堆栈中。 The details are different between 32 and 64 bit, integers/pointers, floats, and structs of different sizes, number of arguments, vararg, and more. 详细信息在32位和64位，整数/指针，浮点数和大小不同，参数数量，vararg等不同的结构之间有所不同。
The call instruction is taken, which pushes the return address onto the stack (taking up 8 bytes in both 32 and 64 bit, I think, though for different reasons) and redirects the processor to the new address. call指令被执行，它将返回地址压入堆栈（由于不同的原因，我认为32位和64位都占用8个字节），并将处理器重定向到新地址。
The stack pointer is saved in the BP register, after pushing the original value of BP (4 or 8 bytes). 在压入BP的原始值（4或8个字节）之后，堆栈指针将保存在BP寄存器中。
The stack pointer is decremented by enough bytes to accommodate all local variables. 堆栈指针递减足够的字节以容纳所有局部变量。

Upon returning, 回来后

The value of the BP register overwrites the stack pointer, negating step 4 automatically. BP寄存器的值将覆盖堆栈指针，自动取消第4步。 Then the original value of BP is popped. 然后弹出BP的原始值。
The ret instruction is taken, popping the return address and jumping there. 使用ret指令，弹出返回地址并跳到那里。

It should be noted that this is by no means universal, or guaranteed. 应当指出，这绝不是普遍的或保证的。 "Simple" functions may be optimized to skip steps 3, 4 and 5. Step 4 can in principle happen multiple times. 可以优化“简单”功能以跳过步骤3、4和5。原则上，步骤4可以多次发生。 Additional magic can be done to the stack pointer like aligning it to a particular power-of-two boundary (like multiples of 128 for SSE instruction operands), allocating something called the red zone, alloca function, etc. Many exceptions and special cases exist. 可以对堆栈指针进行额外的处理，例如将其与特定的2的幂次方对齐（例如SSE指令操作数为128的倍数），分配红色区域， alloca函数等。存在许多异常和特殊情况。 More details will depend on gcc command line parameters, or their built-in defaults per distribution. 更多详细信息将取决于gcc命令行参数或每个发行版的内置默认值。 Other compilers may follow slightly different, yet compatible, conventions. 其他编译器可能遵循略有不同但兼容的约定。 But let's stick to this model. 但是，让我们坚持这个模型。

What's important to notice is that all the local variables are often allocated all together in step 4, and the size that's taken may be either the total size required or more. 需要注意的重要一点是，所有局部变量通常在步骤4中一起分配，并且占用的大小可能是所需的总大小，也可能是更多。 For example, it may be mandated by the conventions that the compiler makes sure that the stack pointer is a multiple of 16 at any point (so that the functions themselves can rely on this), in which case it rounds up to the nearest multiple (also with regard to what had been taken in steps 1 through 3). 例如，惯例可能要求编译器确保堆栈指针在任何时候都是16的倍数（以便函数本身可以依靠它），在这种情况下，它会四舍五入到最接近的倍数（关于步骤1到3）采取的措施。 Within this zone the locals are assigned addresses (offset from the BP or SP) such as to respect their size and alignment requirements. 在该区域内，为本地人分配了地址（与BP或SP的偏移量），例如尊重其大小和对齐要求。

Your example, especially the code in main , can not work because the compiler won't follow your wish to allocate the space for j only after returning from f . 您的示例（尤其是main的代码）无法正常工作，因为仅在从f返回之后，编译器才会按照您的意愿为j分配空间。 It happens along with arr and i in the beginning of the function and the order of the variables is unspecified, likely chosen so that they can be best "packed" into the space that's available, with ints taking addresses at 32- or 64-bit boundaries. 它与arr和i一起出现在函数的开头，并且未指定变量的顺序，可以选择变量的顺序，以便可以将它们最好地“打包”到可用空间中，并且int接受32位或64位地址边界。 Even if it did, the calculation would be mistaken by taking the address of j as the "stack top after func": at best, it would be "stack top after func and allocation ". 即使这样做，通过将j的地址作为“ func之后的栈顶”，也将导致计算错误：充其量是“ func 和allocate之后的栈顶”。 In general, the "stack top after func" must be the same as the "stack top before func" in the C calling convention. 通常，“ func之后的栈顶”必须与C调用约定中的“ func之后的栈顶”相同。

In order to get a more concrete idea in your function, I would suggest either: 为了使您的功能更具体，我建议：

Studying the assembly after compilation. 编译后研究装配 。 The tool at godbolt.com is great for this: here's your code compiled by gcc 8.2 in x86-64 as shown there. godbolt.com上的工具非常godbolt.com ：这是 gcc 8.2在x86-64中编译的代码，如下所示。

The stack pointer should be reduced by 16 (line 6) plus 8 (the size of RBP @ line 4) plus whatever the call at line 28 required to store the return address, 8 in 64-bit mode. 堆栈指针应减少16（第6行）加上8（RBP @ 4行的大小），再加上第28行在64位模式下存储返回地址所需的call ，即8。

Using a debugger : 使用调试器 ：

(gdb) b 11
(gdb) b 4
(gdb) run
Starting program: [redacted]
stack top before func   0x7fffffffd2dc

Breakpoint 1, main () at a.c:11
11      i = func();
(gdb) print $rsp
$1 = (void *) 0x7fffffffd2d0
(gdb) c
Continuing.

Breakpoint 2, func () at a.c:4
4       printf("stack top in func \t%p\n", &c);
(gdb) print $rsp
$2 = (void *) 0x7fffffffd2b0

You can see here that rsp reduced by 0x20 == 32. 您可以在此处看到rsp减少了0x20 == 32。

Answer 2

It is because gcc's stack alignment. 这是因为gcc的堆栈对齐。

In gcc stack alignment is 16 bytes by default,while,at least in my emvironment. 在gcc中，堆栈对齐默认为16字节，而至少在我的环境中。 I changed it to 4 bytes with compile option "-mpreferred-stack-boundary=2",just as same as size of int. 我使用编译选项“ -mpreferred-stack-boundary = 2”将其更改为4个字节，与int的大小相同。

Then the stack top in function will move every single time I declare a new int. 然后，每次我声明一个新的int时，函数顶部的栈都会移动。

Thanks for Jabberwocky and Korni 's comments which introduced a new area I didn't know before. 感谢Jabberwocky和Korni的评论，这些评论引入了我以前不知道的新领域。

C在调用函数激活记录时究竟将其实际使用多少堆栈空间？

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-11-21 10:26:11

解决方案2
0 2018-11-21 10:16:15

C在调用函数激活记录时究竟将其实际使用多少堆栈空间？

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-11-21 10:26:11

解决方案2 0 2018-11-21 10:16:15

解决方案1
4 已采纳 2018-11-21 10:26:11

解决方案2
0 2018-11-21 10:16:15