C++ has a small-size struct calling convention optimization where the compiler passes a small-size struct in function parameters as efficiently as it passes a primitive type (say, via registers). For example:
class MyInt { int n; public: MyInt(int x) : n(x){} };
void foo(int);
void foo(MyInt);
void bar1() { foo(1); }
void bar2() { foo(MyInt(1)); }
bar1()
and bar2()
generate almost identical assembly code except for calling foo(int)
and foo(MyInt)
respectively. Specifically on x86_64, it looks like:
mov edi, 1
jmp foo(MyInt) ;tail-call optimization jmp instead of call ret
But if we test std::tuple<int>
, it will be different:
void foo(std::tuple<int>);
void bar3() { foo(std::tuple<int>(1)); }
struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; };
void foo(MyIntTuple);
void bar4() { foo(MyIntTuple(1)); }
The generated assembly code looks totally different, the small-size struct ( std::tuple<int>
) is passed by pointer:
sub rsp, 24
lea rdi, [rsp+12]
mov DWORD PTR [rsp+12], 1
call foo(std::tuple<int>)
add rsp, 24
ret
I dug a bit deeper, tried to make my int a bit more dirty (This should be close to an incomplete naive tuple impl):
class Empty {};
class MyDirtyInt : protected Empty, MyInt {public: using MyInt::MyInt; };
void foo(MyDirtyInt);
void bar5() { foo(MyDirtyInt(1)); }
but the calling convention optimization is applied:
mov edi, 1
jmp foo(MyDirtyInt)
I have tried GCC/Clang/MSVC, and they all showed the same behavior. ( Godbolt link here ) So I guess this must be something in the C++ standard? (I believe the C++ standard doesn't specify any ABI constraint, though?)
I'm aware that the compiler should be able to optimize these out, as long as the definition of foo(std::tuple<int>)
is visible and not marked noinline. But I want to know which part of the standard or implementation causes the invalidation of this optimization.
FYI, in case you're curious about what I'm doing with std::tuple
, I want to create a wrapper class (ie the strong typedef ) and don't want to declare comparison operators (operator<==>'s prior to C++20) myself and don't want to bother with Boost, so I thought std::tuple
was a good base class because everything was there.
It seems to be a matter of ABI. For instance, the Itanium C++ ABI reads :
If the parameter type is non-trivial for the purposes of calls , the caller must allocate space for a temporary and pass that temporary by reference.
And, further :
A type is considered non-trivial for the purposes of calls if it has a non-trivial copy constructor, move constructor , or destructor, or all of its copy and move constructors are deleted.
The same requirement is in AMD64 ABI Draft 1.0 .
For instance, in libstdc++ , std::tuple
has non-trivial move constructor: https://godbolt.org/z/4j8vds . The Standard prescribes both copy and move constructor as defaulted , which is satisfied here. However, at the same time, tuple
inherits from _Tuple_impl
and _Tuple_impl
has a user-defined move constructor . Consequenlty, move constructor of tuple
itself cannot be trivial.
On the contrary, in libc++ , both copy and move constructors of std::tuple<int>
are trivial. Therefore, the argument is passed in a register there: https://godbolt.org/z/WcTjM9 .
As for Microsoft STL , std::tuple<int>
is trivially neither copy-constructible nor move-constructible. It even seems to break the C++ Standard rules. std::tuple
is defined recursively and, at the end of recursion, std::tuple<>
specialization definesnon-defaulted copy constructor . There is a comment about this issue: // TRANSITION, ABI: should be defaulted
. Since tuple<>
has no move constructor, both copy and move constructors of tuple<class...>
are non-trivial.
As suggested by @StoryTeller it might be related to a user defined move constructor inside std::tuple
that causes this behavior.
See for example: https://godbolt.org/z/3M9KWo
Having user defined move constructor leads to the non-optimized assembly:
bar_my_tuple():
sub rsp, 24
lea rdi, [rsp+12]
mov DWORD PTR [rsp+12], 1
call foo(MyTuple<int>)
add rsp, 24
ret
In libcxx for example the copy and move constructors are declared as default both for tuple_leaf
and for tuple
, and you get the small-size struct call convention optimization for std::tuple<int>
but not for std::tuple<std::string>
which is holding a non trivially moveable member and thus becomes naturally non trivially moveable by itself.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.