Why does std::tuple break small-size struct calling convention optimization in C++?

Question

C++ has a small-size struct calling convention optimization where the compiler passes a small-size struct in function parameters as efficiently as it passes a primitive type (say, via registers). For example:

class MyInt { int n; public: MyInt(int x) : n(x){} };
void foo(int);
void foo(MyInt);
void bar1() { foo(1); }
void bar2() { foo(MyInt(1)); }

bar1() and bar2() generate almost identical assembly code except for calling foo(int) and foo(MyInt) respectively. Specifically on x86_64, it looks like:

        mov     edi, 1
        jmp     foo(MyInt) ;tail-call optimization jmp instead of call ret

But if we test std::tuple<int> , it will be different:

void foo(std::tuple<int>);
void bar3() { foo(std::tuple<int>(1)); }

struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; };
void foo(MyIntTuple);
void bar4() { foo(MyIntTuple(1)); }

The generated assembly code looks totally different, the small-size struct ( std::tuple<int> ) is passed by pointer:

        sub     rsp, 24
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 1
        call    foo(std::tuple<int>)
        add     rsp, 24
        ret

I dug a bit deeper, tried to make my int a bit more dirty (This should be close to an incomplete naive tuple impl):

class Empty {};
class MyDirtyInt : protected Empty, MyInt {public: using MyInt::MyInt; };
void foo(MyDirtyInt);
void bar5() { foo(MyDirtyInt(1)); }

but the calling convention optimization is applied:

        mov     edi, 1
        jmp     foo(MyDirtyInt)

I have tried GCC/Clang/MSVC, and they all showed the same behavior. ( Godbolt link here ) So I guess this must be something in the C++ standard? (I believe the C++ standard doesn't specify any ABI constraint, though?)

I'm aware that the compiler should be able to optimize these out, as long as the definition of foo(std::tuple<int>) is visible and not marked noinline. But I want to know which part of the standard or implementation causes the invalidation of this optimization.

FYI, in case you're curious about what I'm doing with std::tuple , I want to create a wrapper class (ie the strong typedef ) and don't want to declare comparison operators (operator<==>'s prior to C++20) myself and don't want to bother with Boost, so I thought std::tuple was a good base class because everything was there.

Answer 1

It seems to be a matter of ABI. For instance, the Itanium C++ ABI reads :

If the parameter type is non-trivial for the purposes of calls , the caller must allocate space for a temporary and pass that temporary by reference.

And, further :

A type is considered non-trivial for the purposes of calls if it has a non-trivial copy constructor, move constructor , or destructor, or all of its copy and move constructors are deleted.

The same requirement is in AMD64 ABI Draft 1.0 .

For instance, in libstdc++ , std::tuple has non-trivial move constructor: https://godbolt.org/z/4j8vds . The Standard prescribes both copy and move constructor as defaulted , which is satisfied here. However, at the same time, tuple inherits from _Tuple_impl and _Tuple_impl has a user-defined move constructor . Consequenlty, move constructor of tuple itself cannot be trivial.

On the contrary, in libc++ , both copy and move constructors of std::tuple<int> are trivial. Therefore, the argument is passed in a register there: https://godbolt.org/z/WcTjM9 .

As for Microsoft STL , std::tuple<int> is trivially neither copy-constructible nor move-constructible. It even seems to break the C++ Standard rules. std::tuple is defined recursively and, at the end of recursion, std::tuple<> specialization definesnon-defaulted copy constructor . There is a comment about this issue: // TRANSITION, ABI: should be defaulted . Since tuple<> has no move constructor, both copy and move constructors of tuple<class...> are non-trivial.

Answer 2

As suggested by @StoryTeller it might be related to a user defined move constructor inside std::tuple that causes this behavior.

See for example: https://godbolt.org/z/3M9KWo

Having user defined move constructor leads to the non-optimized assembly:

bar_my_tuple():
        sub     rsp, 24
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 1
        call    foo(MyTuple<int>)
        add     rsp, 24
        ret

In libcxx for example the copy and move constructors are declared as default both for tuple_leaf and for tuple , and you get the small-size struct call convention optimization for std::tuple<int> but not for std::tuple<std::string> which is holding a non trivially moveable member and thus becomes naturally non trivially moveable by itself.

Why does std::tuple break small-size struct calling convention optimization in C++?

Question

2 answers

solution1
11 2020-09-03 12:31:48

solution2
4 2020-09-03 11:39:44

Why does std::tuple break small-size struct calling convention optimization in C++?

Question

2 answers

solution1 11 2020-09-03 12:31:48

solution2 4 2020-09-03 11:39:44

solution1
11 2020-09-03 12:31:48

solution2
4 2020-09-03 11:39:44