简体   繁体   中英

Why does gcc and clang produce very differnt code for member function template parameters?

I am trying to understand what is going on when a member function pointer is used as template parameter. I always thought that function pointers (or member function pointers) are a run-time concept, so I was wondering what happens when they are used as template parameters. For this reason I took a look a the output produced by this code:

struct Foo { void foo(int i){ } };    
template <typename T,void (T::*F)(int)>
void callFunc(T& t){ (t.*F)(1); }
void callF(Foo& f){ f.foo(1);}    
int main(){
    Foo f;
    callF(f);
    callFunc<Foo,&Foo::foo>(f);
}

where callF is for comparison. gcc 6.2 produces the exact same output for both functions:

callF(Foo&):  // void callFunc<Foo, &Foo::foo>(Foo&):
    push    rbp
    mov     rbp, rsp
    sub     rsp, 16
    mov     QWORD PTR [rbp-8], rdi
    mov     rax, QWORD PTR [rbp-8]
    mov     esi, 1
    mov     rdi, rax
    call    Foo::foo(int)
    nop
    leave
    ret

while clang 3.9 produces almost the same output for callF() :

callF(Foo&):                          # @callF(Foo&)
    push    rbp
    mov     rbp, rsp
    sub     rsp, 16
    mov     esi, 1
    mov     qword ptr [rbp - 8], rdi
    mov     rdi, qword ptr [rbp - 8]
    call    Foo::foo(int)
    add     rsp, 16
    pop     rbp
    ret

but very different output for the template instantiation:

void callFunc<Foo, &Foo::foo>(Foo&): # @void callFunc<Foo, &Foo::foo>(Foo&)
    push    rbp
    mov     rbp, rsp
    sub     rsp, 32
    xor     eax, eax
    mov     cl, al
    mov     qword ptr [rbp - 8], rdi
    mov     rdi, qword ptr [rbp - 8]
    test    cl, 1
    mov     qword ptr [rbp - 16], rdi # 8-byte Spill
    jne     .LBB3_1
    jmp     .LBB3_2
.LBB3_1:
    movabs  rax, Foo::foo(int)
    sub     rax, 1
    mov     rcx, qword ptr [rbp - 16] # 8-byte Reload
    mov     rdx, qword ptr [rcx]
    mov     rax, qword ptr [rdx + rax]
    mov     qword ptr [rbp - 24], rax # 8-byte Spill
    jmp     .LBB3_3
.LBB3_2:
    movabs  rax, Foo::foo(int)
    mov     qword ptr [rbp - 24], rax # 8-byte Spill
    jmp     .LBB3_3
.LBB3_3:
    mov     rax, qword ptr [rbp - 24] # 8-byte Reload
    mov     esi, 1
    mov     rdi, qword ptr [rbp - 16] # 8-byte Reload
    call    rax
    add     rsp, 32
    pop     rbp
    ret

Why is that? Is gcc taking some (possibly non-standard) shortcut?

gcc was able to figure out what the template was doing, and generated the simplest code possible. clang didn't. A compiler is permitted to perform any optimization as long as the observable results are compliant with the C++ specification. If optimizing away an intermediate function pointer, so be it. Nothing else in the code references the temporary function pointer, so it can be optimized away completely, and the whole thing replaced with a simple function call.

gcc and clang are different compilers, written by different people, with different approaches and algorithms for compiling C++.

It is natural, and expected to see different results from different compilers. In this case, gcc was able to figure things out better than clang. I'm sure there are other situations where clang will be able to figure things out better than gcc.

This test was done without any optimizations requested.

One compiler generated more verbose unoptimized code.

Unoptimized code is, quite simply, uninteresting. It is intended to be correct and easy to debug and derive directly from some intermediate representation that is easy to optimize.

The details of optimized code are what matter, barring a ridiculous and widespread slowdown that makes debugging painful.

There is nothing of interest to see or explain here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM