简体   繁体   中英

How do pointers to member functions work?

I understand that normal function pointer contain the start address of the function being pointed to so when a normal function pointer is used, we just jump to the stored address. But what does a pointer to an object member function contain?

Consider:

class A
{
public:
    int func1(int v) {
        std::cout << "fun1";
        return v;
    }
    virtual int func2(int v) {
        std::cout << "fun2";
        return v;
    }
};

int main(int argc, char** argv)
{
    A a;
    int (A::*pf)(int a) = argc > 2 ? &A::func1 : &A::func2;
    static_assert(sizeof(pf) == (sizeof(void*), "Unexpected function size");
    return (a.*pf)(argc);
}

In the above program, the function pointer can take its value from either a virtual function (that needs to be accessed via the vtable) or a normal class member (that is implemented as a normal function with an implicit this as a first argument.)

So what is the value stored in my pointer to member function and how does the compiler get things to work as expected?

This of course depends on the compiler and the target architecture, and there is more than one single way to do it. But I'll describe how it works on the system I use most, g++ for Linux x86_64.

g++ follows the Itanium C++ ABI , which describes a lot of the details of one way various C++ features including virtual functions can be implemented behind the scenes for most architectures.

The ABI says this about pointers to member functions, in section 2.3:

A pointer to member function is a pair as follows:

ptr :

For a non-virtual function, this field is a simple function pointer. ... For a virtual function, it is 1 plus the virtual table offset (in bytes) of the function, represented as a ptrdiff_t . The value zero represents a NULL pointer, independent of the adjustment field value below.

adj :

The required adjustment to this , represented as a ptrdiff_t .

It has the size, data size, and alignment of a class containing those two members, in that order.

The +1 to ptr for a virtual function helps detect whether or not the function is virtual, since for most platforms all function pointer values and vtable offsets are even. It also makes sure a null member function pointer has a distinct value from any valid member function pointer.

The vtable / vptr setup for your class A will work something like this C code:

struct A__virt_funcs {
    int (*func2)(A*, int);
};

struct A__vtable {
    ptrdiff_t offset_to_top;
    const std__typeinfo* typeinfo;
    struct A__virt_funcs funcs;
};

struct A {
    const struct A__virt_funcs* vptr;
};

int A__func1(struct A*, int v) {
    std__operator__ltlt(&std__cout, "fun1");
    return v;
}

int A__func2(struct A*, int v) {
    std__operator__ltlt(&std__cout, "fun2");
    return v;
}

extern const std__typeinfo A__typeinfo;

const struct A__vtable vt_for_A = { 0, &A__typeinfo, { &A__func2 } };

void A__initialize(A* a) {
    a->vptr = &vt_for_A.funcs;
}

(Yes, a real name mangling scheme would need to do something with function parameter types to allow for overloading, and more things since the operator<< involved is actually a function template specialization. But that's beside the point here.)

Now let's look at the assembly I get for your main() (with options -O0 -fno-stack-protector ). My comments are added.

Dump of assembler code for function main:
     // Standard stack adjustment for function setup.
   0x00000000004007e6 <+0>: push   %rbp
   0x00000000004007e7 <+1>: mov    %rsp,%rbp
   0x00000000004007ea <+4>: push   %rbx
   0x00000000004007eb <+5>: sub    $0x38,%rsp
     // Put argc in the stack at %rbp-0x34.
   0x00000000004007ef <+9>: mov    %edi,-0x34(%rbp)
     // Put argv in the stack at %rbp-0x40.
   0x00000000004007f2 <+12>:    mov    %rsi,-0x40(%rbp)
     // Construct "a" on the stack at %rbp-0x20.
     // 0x4009c0 is &vt_for_A.funcs.
   0x00000000004007f6 <+16>:    mov    $0x4009c0,%esi
   0x00000000004007fb <+21>:    mov    %rsi,-0x20(%rbp)
     // Check if argc is more than 2.
     // In both cases, "pf" will be on the stack at %rbp-0x30.
   0x00000000004007ff <+25>:    cmpl   $0x2,-0x34(%rbp)
   0x0000000000400803 <+29>:    jle    0x400819 <main+51>
     // if (argc <= 2) {
     //   Initialize pf to { &A__func2, 0 }.
   0x0000000000400805 <+31>:    mov    $0x4008ce,%ecx
   0x000000000040080a <+36>:    mov    $0x0,%ebx
   0x000000000040080f <+41>:    mov    %rcx,-0x30(%rbp)
   0x0000000000400813 <+45>:    mov    %rbx,-0x28(%rbp)
   0x0000000000400817 <+49>:    jmp    0x40082b <main+69>
     // } else { [argc > 2]
     //   Initialize pf to { 1, 0 }.
   0x0000000000400819 <+51>:    mov    $0x1,%eax
   0x000000000040081e <+56>:    mov    $0x0,%edx
   0x0000000000400823 <+61>:    mov    %rax,-0x30(%rbp)
   0x0000000000400827 <+65>:    mov    %rdx,-0x28(%rbp)
     // }
     // Test whether pf.ptr is even or odd:
   0x000000000040082b <+69>:    mov    -0x30(%rbp),%rax
   0x000000000040082f <+73>:    and    $0x1,%eax
   0x0000000000400832 <+76>:    test   %rax,%rax
   0x0000000000400835 <+79>:    jne    0x40083d <main+87>
     // int (*funcaddr)(A*, int); [will be in %rax]
     // if (is_even(pf.ptr)) {
     //   Just do:
     //   funcaddr = pf.ptr;
   0x0000000000400837 <+81>:    mov    -0x30(%rbp),%rax
   0x000000000040083b <+85>:    jmp    0x40085c <main+118>
     // } else { [is_odd(pf.ptr)]
     //   Compute A* a2 = (A*)((char*)&a + pf.adj); [in %rax]
   0x000000000040083d <+87>:    mov    -0x28(%rbp),%rax
   0x0000000000400841 <+91>:    mov    %rax,%rdx
   0x0000000000400844 <+94>:    lea    -0x20(%rbp),%rax
   0x0000000000400848 <+98>:    add    %rdx,%rax
     //   Compute funcaddr =
     //     (int(*)(A*,int)) (((char*)(a2->vptr))[pf.ptr-1]);
   0x000000000040084b <+101>:   mov    (%rax),%rax
   0x000000000040084e <+104>:   mov    -0x30(%rbp),%rdx
   0x0000000000400852 <+108>:   sub    $0x1,%rdx
   0x0000000000400856 <+112>:   add    %rdx,%rax
   0x0000000000400859 <+115>:   mov    (%rax),%rax
     // }
     // Compute A* a3 = (A*)((char*)&a + pf.adj); [in %rcx]
   0x000000000040085c <+118>:   mov    -0x28(%rbp),%rdx
   0x0000000000400860 <+122>:   mov    %rdx,%rcx
   0x0000000000400863 <+125>:   lea    -0x20(%rbp),%rdx
   0x0000000000400867 <+129>:   add    %rdx,%rcx
     // Call int r = (*funcaddr)(a3, argc);
   0x000000000040086a <+132>:   mov    -0x34(%rbp),%edx
   0x000000000040086d <+135>:   mov    %edx,%esi
   0x000000000040086f <+137>:   mov    %rcx,%rdi
   0x0000000000400872 <+140>:   callq  *%rax
     // Standard stack cleanup for function exit.
   0x0000000000400874 <+142>:   add    $0x38,%rsp
   0x0000000000400878 <+146>:   pop    %rbx
   0x0000000000400879 <+147>:   pop    %rbp
     // Return r.
   0x000000000040087a <+148>:   retq   
End of assembler dump.

But then what's the deal with the member function pointer's adj value? The assembly added it to the address of a before doing the vtable lookup and also before calling the function, whether the function was virtual or not. But both cases in main set it to zero, so we haven't really seen it in action.

The adj value comes in when we have multiple inheritance. So now suppose we have:

class B
{
public:
    virtual void func3() {}
    int n;
};

class C : public B, public A
{
public:
    int func4(int v) { return v; }
    int func2(int v) override { return v; }
};

The layout of an object of type C contains a B subobject (which contains another vptr and an int ) and then an A subobject. So the address of the A contained in a C is not the same as the address of the C itself.

As you might be aware, any time code implicitly or explicitly converts a (non-null) C* pointer to an A* pointer, the C++ compiler accounts for this difference by adding the correct offset to the address value. C++ also allows converting from a pointer to member function of A to a pointer to member function of C (since any member of A is also a member of C ), and when that happens (for a non-null member function pointer), a similar offset adjustment needs to be made. So if we have:

int (A::*pf1)(int) = &A::func1;
int (C::*pf2)(int) = pf1;

the values within the member function pointers under the hood would be pf1 = { &A__func1, 0 }; and pf2 = { &A__func1, offset_A_in_C }; .

And then if we have

C c;
int n = (c.*pf2)(3);

the compiler will implement the call to the member function pointer by adding the offset pf2.adj to the address &c to find the implicit "this" parameter, which is good because then it will be a valid A* value as A__func1 expects.

The same thing goes for a virtual function call, except that as the disassembly dump showed, the offset is needed both to find the implicit "this" parameter and to find the vptr which contains the actual function code address. There's an added twist to the virtual case, but it's one which is needed for both ordinary virtual calls and calls using a pointer to member function: The virtual function func2 will be called with an A* "this" parameter since that's where the original overridden declaration is, and the compiler won't in general be able to know if the "this" argument is actually of any other type. But the definition of override C::func2 expects a C* "this" parameter. So when the most derived type is C , the vptr within the A subobject will point at a vtable which has an entry pointing not at the code for C::func2 itself, but at a tiny "thunk" function, which does nothing but subtract offset_A_in_C from the "this" parameter and then pass control to the actual C::func2 .

GCC 的文档是保偏光纤实现为那些知道如何计算的价值结构this并进行任何虚函数表查找。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM