[英]What's up with gcc's handling of alloca?
On most platforms, alloca
just boils down to an inline adjustment of the stack pointer (for example, subtracting from rsp
on x64, plus a bit of logic to maintain stack alignment). 在大多数平台上, alloca
只是归结为堆栈指针的内联调整(例如,从x64上的rsp
减去,加上一些逻辑来维持堆栈对齐)。
I was looking at the code that gcc
generates for alloca and it is pretty weird. 我正在查看gcc
为alloca生成的代码,这很奇怪。 Take the following simple example 1 : 采用以下简单示例1 :
#include <alloca.h>
#include <stddef.h>
volatile void *psink;
void func(size_t x) {
psink = alloca(x);
}
This compiles to the following assembly at -O2
: 这将编译为-O2
的以下程序集:
func(unsigned long):
push rbp
add rdi, 30
and rdi, -16
mov rbp, rsp
sub rsp, rdi
lea rax, [rsp+15]
and rax, -16
mov QWORD PTR psink[rip], rax
leave
ret
There are several confusing things here. 这里有几个令人困惑的事情。 I understand that gcc
needs to round the allocated size up to a multiple of 16 (to maintain stack alignment), and the usual way to do that would be (size + 15) & ~0xF
but instead it adds 30 at add rdi, 30
? 我知道gcc
需要将分配的大小舍入到16的倍数(以保持堆栈对齐),通常的方法是(size + 15) & ~0xF
但是在add rdi, 30
时add rdi, 30
? What's up with that? 那是怎么回事?
Second, I would just expect the result of alloca
to be the new rsp
value, which is already well-aligned. 其次,我只希望alloca
的结果是新的rsp
值,它已经很好地对齐了。 Instead, gcc does this: 相反,gcc这样做:
lea rax, [rsp+15]
and rax, -16
Which seems to be "realigning" the value of rsp
to use as the result of alloca
- but we already did the work to align rsp
to a 16-byte boundary in the first place. 这似乎是“重新调整” rsp
的值以用作alloca
的结果 - 但我们已经完成了将rsp
与16字节边界对齐的工作。
What's up with that? 那是怎么回事?
You can play with the code on godbolt . 你可以在godbolt上玩代码。 It is worth noting that clang
and icc
do the "expected thing" on x86 at least. 值得注意的是, clang
和icc
至少在x86上做了“预期的事情”。 With VLAs (as suggested in earlier comments), gcc
and clang
does fine while icc
produces an abomination. 使用VLA(如前面评论中所述), gcc
和clang
良好,而icc
产生憎恶。
1 Here, the assignment to psink
is just to consume the result of alloca
since otherwise the compiler just omits it entirely. 1这里,对psink
的赋值只是为了消耗alloca
的结果,否则编译器就完全省略了它。
This is a very old, normal priority bug . 这是一个非常古老的普通优先级错误 。 The code works correctly. 代码工作正常。 It's just that when the size is larger than 1 byte, 16 more bytes are unnecessarily allocated. 只是当大小大于1个字节时,不必要地分配16个字节。 So it's not a correctness bug, it's a minor efficiency bug. 所以这不是一个正确性错误,它是一个小的效率错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.