调用者或被调用者是否负责释放 x64 程序集（Windows）中的影子存储？

Question

Coming from C and C++, I have recently started to learn x86-64 assembly to understand better the workings of my programs.来自 C 和 C++，我最近开始学习 x86-64 汇编，以更好地了解我的程序的工作原理。

I know that the convention in x64 assembly is to reserve 32 bytes of 'shadow store' on the stack before calling a function (by doing: subq $0x20, %rsp ).我知道 x64 汇编中的约定是在调用 function 之前在堆栈上保留 32 个字节的“影子存储”（通过执行： subq $0x20, %rsp ）。

What I am unsure about is: is the callee responsible for incrementing %rsp again, or the caller?我不确定的是：被调用者是负责再次增加%rsp还是调用者？

In other words (using printf as an example), would number 1 or number 2 be correct (or perhaps neither:P)?换句话说（以printf为例），数字 1 或数字 2 是否正确（或者可能都不正确：P）？

1. 1.

subq $0x20, %rsp
movabsq $msg, %rcx
callq printf

subq $0x20, %rsp
movabsq $msg, %rcx
callq printf
addq $0x20, %rsp

(... where msg is an ascii string stored in the .data section that I am passing to printf ) （...其中msg是存储在我传递给printf的.data部分中的 ascii 字符串）

I am on Windows 10, using GAS as my assembler.我在 Windows 10 上，使用 GAS 作为我的汇编程序。

Any help would be much appreciated, cheers.任何帮助将不胜感激，干杯。

Answer 1

Deallocating shadow space is the caller's responsibility.释放影子空间是调用者的责任。

But normally you'd do it once per function, not once per call-site within a function.但通常你会在每个 function 中执行一次，而不是在function 中的每个呼叫站点执行一次。 Usually you just move RSP once (maybe after some pushes) and leave it alone until you're ready to return.通常，您只需移动一次 RSP（可能在一些推动之后），然后将其放在一边，直到您准备好返回。 That includes making room to store stack args if any for functions with more than 4 args.这包括为超过 4 个参数的函数腾出空间来存储堆栈参数（如果有的话）。

In the Windows x64 calling convention (and x86-64 System V), the callee must return without changing the caller's RSP.在 Windows x64 调用约定（和 x86-64 System V）中，被调用者必须返回而不更改调用者的 RSP。 ie with ret , not ret 32 , and without having copied the return address somewhere else.即使用ret ，而不是ret 32 ，并且没有将返回地址复制到其他地方。

MS has some examples in https://docs.microsoft.com/en-us/cpp/build/prolog-and-epilog?view=msvc-170#epilog-code MS 在https://docs.microsoft.com/en-us/cpp/build/prolog-and-epilog?view=msvc-170#epilog-code中有一些例子
And specifically documents that RSP mustn't be changed by functions : 尤其是 RSP 不能被函数改变的文件：

The x64 ABI considers registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, and XMM6-XMM15 nonvolatile . x64 ABI 考虑寄存器 RBX、RBP、RDI、RSI、 RSP、 R12、R13、R14、R15 和 XMM6-XMM15非易失性。 They must be saved and restored by a function that uses them.它们必须由使用它们的 function 保存和恢复。

(You also need to emit unwind metadata for every instruction that moves the stack pointer, and about where you saved non-volatile aka call-preserved registers, if you want to be fully compliant with the ABI, including for SEH and C++ exception unwinding. Toy programs still work fine without, as long as you don't expect C++ exceptions to work, or debuggers to unwind the stack back to the stack frame of a caller.) （如果您想完全符合 ABI，包括 SEH 和 C++ 异常展开，您还需要为移动堆栈指针的每条指令发出展开元数据，以及保存非易失性又名调用保留寄存器的位置。只要您不期望 C++ 异常起作用，或者调试器将堆栈展开回调用者的堆栈框架，玩具程序仍然可以正常工作。）

You can see this if you look at MSVC compiler output , eg https://godbolt.org/z/xh38jxWqT , or for AT&T syntax, gcc -O2 -mabi=ms to tell it that all the functions it sees are __attribute__((ms_abi)) by default, but it doesn't override the fact that it's targeting Linux. You can see this if you look at MSVC compiler output , eg https://godbolt.org/z/xh38jxWqT , or for AT&T syntax, gcc -O2 -mabi=ms to tell it that all the functions it sees are __attribute__((ms_abi))默认情况下，但它不会覆盖它针对 Linux 的事实。 So with -fPIE to make it use LEA instead of 32-bit absolute addressing for symbol addresses, we also get call printf@plt , not Windows style calls to DLL functions.因此，使用-fPIE使其对符号地址使用 LEA 而不是 32 位绝对寻址，我们还可以call printf@plt ，而不是 Windows 样式调用 DLL 函数。

But the stack management from GCC matches what MSVC -O2 also does.但是来自 GCC 的堆栈管理与 MSVC -O2 的功能相匹配。

#include <stdio.h>

void bar();
int foo(){
    printf("%d\n", 1);
    bar();
    return 1;  // make sure this isn't a tailcall
}

# gcc -O2 -mabi=ms  (but still sort of targeting Linux as far as dynamic linking)
.LC0:
        .string "%d\n"      ## in .rodata

foo():
        subq    $40, %rsp
        movl    $1, %edx
        movl    $.LC0, %ecx      # with -fPIE, uses    leaq    .LC0(%rip), %rcx  like you'd want for Windows x64
        call    printf
        call    bar()
        movl    $1, %eax
        addq    $40, %rsp
        ret

See also How to remove "noise" from GCC/clang assembly output?另请参阅如何从 GCC/clang 程序集 output 中删除“噪音”？ for more about looking at compiler output - you can answer most questions about how things normally work by looking at what compilers do in practice.有关查看编译器 output 的更多信息 - 您可以通过查看编译器在实践中的操作来回答大多数关于正常工作方式的问题。 Sometimes things compilers do are just a coincidence, especially with optimization disabled (which is why I constructed an example that couldn't inline the functions, so I could still see the calls with optimization enabled).有时编译器所做的事情只是巧合，尤其是在禁用优化的情况下（这就是为什么我构建了一个无法内联函数的示例，所以我仍然可以看到启用优化的调用）。 But here we can rule out your alternate hypothethis.但在这里我们可以排除你的替代假设。

I also constructed this example to show two calls using the same allocation of shadow space, not pointlessly deallocating / reallocating with add/sub.我还构建了这个示例来显示两个调用使用相同的影子空间分配，而不是使用 add/sub 毫无意义地解除分配/重新分配。 Even with optimization disabled, compilers don't do that.即使禁用了优化，编译器也不会这样做。

Re: putting symbol addresses into registers, see How to load address of function or label into register - RIP-relative LEA is the go-to option.回复：将符号地址放入寄存器，请参阅如何将 function 或 label 的地址加载到寄存器中- 相对于 RIP 的 LEA 是首选选项。 It's position-independent, and works in any executable or library smaller than 2GiB of static code+data.它与位置无关，适用于任何小于 2GiB 的 static 代码+数据的可执行文件或库。 And more efficient than movabs .并且比movabs更高效。

调用者或被调用者是否负责释放 x64 程序集（Windows）中的影子存储？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-31 23:41:33

调用者或被调用者是否负责释放 x64 程序集（Windows）中的影子存储？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-31 23:41:33

解决方案1
1 已采纳 2022-07-31 23:41:33