[英]passing rvalue to non-ref parameter, why can't the compiler elide the copy?
struct Big {
int a[8];
};
void foo(Big a);
Big getStuff();
void test1() {
foo(getStuff());
}
compiles (using clang 6.0.0 for x86_64 on Linux so System V ABI, flags: -O3 -march=broadwell
) to 编译(在Linux上使用clang 6.0.0 for x86_64,所以System V ABI,标志:
-O3 -march=broadwell
)到
test1(): # @test1()
sub rsp, 72
lea rdi, [rsp + 40]
call getStuff()
vmovups ymm0, ymmword ptr [rsp + 40]
vmovups ymmword ptr [rsp], ymm0
vzeroupper
call foo(Big)
add rsp, 72
ret
If I am reading this correctly, this is what is happening: 如果我正确地阅读这个,那就是正在发生的事情:
getStuff
is passed a pointer to foo
's stack ( rsp + 40
) to use for its return value, so after getStuff
returns rsp + 40
through to rsp + 71
contains the result of getStuff
. getStuff
传递一个指向foo
的堆栈( rsp + 40
)的指针用于返回值,所以在getStuff
返回getStuff
rsp + 40
到getStuff
rsp + 71
包含getStuff
的结果。 rsp
through to rsp + 31
. rsp
到rsp + 31
。 foo
is then called, which will read its argument from rsp
. foo
,它将从rsp
读取其参数。 Why is the following code not totally equivalent (and why doesn't the compiler generate it instead)? 为什么以下代码不完全等效(为什么编译器不会生成它)?
test1(): # @test1()
sub rsp, 32
mov rdi, rsp
call getStuff()
call foo(Big)
add rsp, 32
ret
The idea is: have getStuff
write directly to the place in the stack that foo
will read from. 这个想法是:让
getStuff
直接写入foo
将读取的堆栈中的位置。
Also: Here is the result for the same code (with 12 ints instead of 8) compiled by vc++ on windows for x64, which seems even worse because the windows x64 ABI passes and returns by reference, so the copy is completely unused! 另外:这是在Windows for x64上由vc ++编译的相同代码(12个int而不是8个)的结果,这看起来更糟,因为windows x64 ABI通过并通过引用返回,因此副本完全未使用!
_TEXT SEGMENT
$T3 = 32
$T1 = 32
?bar@@YAHXZ PROC ; bar, COMDAT
$LN4:
sub rsp, 88 ; 00000058H
lea rcx, QWORD PTR $T1[rsp]
call ?getStuff@@YA?AUBig@@XZ ; getStuff
lea rcx, QWORD PTR $T3[rsp]
movups xmm0, XMMWORD PTR [rax]
movaps XMMWORD PTR $T3[rsp], xmm0
movups xmm1, XMMWORD PTR [rax+16]
movaps XMMWORD PTR $T3[rsp+16], xmm1
movups xmm0, XMMWORD PTR [rax+32]
movaps XMMWORD PTR $T3[rsp+32], xmm0
call ?foo@@YAHUBig@@@Z ; foo
add rsp, 88 ; 00000058H
ret 0
You're right; 你是对的; this looks like a missed-optimization by the compiler .
这看起来像编译器的遗漏优化 。 You can report this bug ( https://bugs.llvm.org/ ) if there isn't already a duplicate.
如果还没有重复,您可以报告此错误( https://bugs.llvm.org/ )。
Contrary to popular belief, compilers often don't make optimal code. 与流行的看法相反,编译器通常不会制作最佳代码。 It's often good enough, and modern CPUs are quite good at plowing through excess instructions when they don't lengthen dependency chains too much, especially the critical path dependency chain if there is one.
它通常足够好,并且现代CPU在不过多地延长依赖链时会非常擅长翻阅过多的指令,尤其是关键路径依赖链(如果有的话)。
x86-64 SysV passes large structs by value on the stack if they don't fit packed into two 64-bit integer registers, and them returns via hidden pointer. x86-64 SysV通过堆栈上的值传递大型结构,如果它们不适合打包到两个64位整数寄存器中,并且它们通过隐藏指针返回。 The compiler can and should (but doesn't) plan ahead and reuse the return value temporary as the stack-args for the call to
foo(Big)
. 编译器可以而且应该(但不)提前计划并将返回值临时重用为
foo(Big)
调用的stack-args。
gcc7.3, ICC18, and MSVC CL19 also miss this optimization. gcc7.3,ICC18和MSVC CL19也错过了这种优化。 :/ I put your code up on the Godbolt compiler explorer with gcc/clang/ICC/MSVC .
:/我用Gcc / clang / ICC / MSVC将你的代码放在Godbolt编译器资源管理器上 。 gcc uses 4x
push qword [rsp+24]
to copy, while ICC uses extra instructions to align the stack by 32. gcc使用4x
push qword [rsp+24]
进行复制,而ICC使用额外的指令将堆栈对齐32。
Using 1x 32-byte load/store instead of 2x 16-byte probably doesn't justify the cost of the vzeroupper
for MSVC / ICC / clang, for a function this small. 对于MSVC / ICC / clang,使用1x 32字节加载/存储而不是2x 16字节可能无法证明
vzeroupper
的成本,因为这个函数很小。 vzeroupper
is cheap on mainstream Intel CPUs (only 4 uops), and I did use -march=haswell
to tune for that, not for AMD or KNL where it's more expensive. vzeroupper
在主流Intel CPU(仅4 vzeroupper
很便宜,而且我确实使用-march=haswell
来调整它,而不是AMD或KNL,它更贵。
Related: x86-64 Windows passes large structs by hidden pointer, as well as returning them that way. 相关:x86-64 Windows通过隐藏指针传递大型结构,并以这种方式返回它们。 The callee owns the pointed-to memory.
被调用者拥有指向的内存。 ( What happens at assembly level when you have functions with large inputs )
( 当您具有大输入的函数时,在汇编级别会发生什么 )
This optimization would still be available by simply reserving space for the temporary + shadow-space before the first call to getStuff()
, and allowing the callee to destroy the temporary because we don't need it later. 在第一次调用
getStuff()
之前,只需为临时+阴影空间保留空间,并允许被调用者销毁临时文件,因为我们以后不再需要它,因此仍然可以使用此优化。
That's not actually what MSVC does here or in related cases, though, unfortunately. 不幸的是,这实际上并不是MSVC在这里或相关案例中所做的。
See also @BeeOnRope's answer, and my comments onit, on Why isn't pass struct by reference a common optimization? 另见@ BeeOnRope的答案,以及我的评论,关于为什么不通过引用传递struct一个常见的优化? .
。 Making sure the copy-constructor can always run at a sane place for non-trivially-copyable objects is problematic if you're trying to design a calling convention that avoids copying by passing by hidden const-reference (caller owns the memory, callee can copy if needed).
如果你试图通过传递隐藏的const-reference来设计一个避免复制的调用约定,那么确保copy-constructor总能在一个理想的位置运行非平凡可复制的对象是有问题的(调用者拥有内存,被调用者可以如果需要复制)。
But this is an example of a case where non-const reference (callee owns the memory) is best, because the caller wants to hand off the object to the callee. 但这是一个非const引用(被调用者拥有内存)最好的情况的例子,因为调用者想要将对象移交给被调用者。
There's a potential gotcha, though: if there are any pointers to this object, letting the callee use it directly could introduce bugs . 但是有一个潜在的问题: 如果有任何指向此对象的指针,让被调用者直接使用它可能会引入错误 。 Consider some other function that does
global_pointer->a[4]=0;
考虑一些其他函数,它执行
global_pointer->a[4]=0;
. 。 If our callee calls that function, it will unexpectedly modify our callee's by-value arg.
如果我们的被调用者调用该函数,它将意外地修改我们的被调用者的按值arg。
So letting the callee destroy our copy of the object in the Windows x64 calling convention only works if escape analysis can prove that nothing else has a pointer to this object. 因此,如果转义分析可以证明没有其他任何指针指向此对象,那么让被调用者在Windows x64调用约定中销毁该对象的副本是有效的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.