Why C++ virtual call is not much slower than a non-virtual one?

Question

In my undetrstanding, for a C++ virtual call, it needs to:

Get the type of the object from the symbol table
Get the v-table from the type table
Search the function using the function signature in the v-table
Call the function.

While for a non-virtual (such as in C) call, only #4 is required.

I think that #3 should be the most time consuming. Given the nature of realtime overriding in C++, I could not see much potential for compilation time optimization for the above steps. Thus for a complex class inheritance with long function signatures, a C++ virtual call should be much slower than a non-virtual call.

But all claims are contrary, why?

Answer 1

Get the type of the object from the symbol table

Get the v-table from the type table

Search the function using the function signature in the v-table

Call the function.

This is a poor understanding of how v-table-based dispatch works. It's much simpler:

Get the v-table from the object pointer. Pick the right v-table for the function in question (if multiple base classes are used).
Add a specific offset, compile-time determined, to this v-table pointer, thus fetching a specific function pointer.
Call that function pointer.

Each object has a v-table pointer, which points at the v-table for that object's original type. So there's no need to fetch the type from a "symbol table". No searching of the v-table is necessary. It's compile-time determinable exactly which pointer in the v-table needs to be accessed, based on the function signature provided at compile time. It's all about how the compiler indexes each virtual function in a class. It can determine a specific order for each virtual function, and thus when the compiler goes to call it, it can determine which function to call.

So it's quite fast overall.

It's a bit more complex when dealing with virtual base classes, but the general idea is still the same.

Answer 2

The overhead for a virtual function call over a normal function call is two extra fetch operations(one to get the value of the v-pointer, a second to get the address of the method).
In most situations this overhead is not significant enough to show in performance profiling.

Also, in some cases if the virtual function to be called can be determined at compile time a smart compiler will do so rather than the doing it at runtime.

Answer 3

1 & 2) It does not need to retrieve the type of the object from any "symbol table". The v-table is typically pointed to by a hidden field in the object. So retrieving the v-table is basically one pointer indirection.

3) The v-table is not "searched". Each virtual function has a fixed index/offset within the v-table, determined at compile-time. So this is basically a fetch from an offset from a pointer.

So, while it is slower than a direct C-style call, it is not as arduous as you suggest. It is similar to something like this in C:

struct MyObject_vtable {
    int (*foo)();
    void (*bar)(const char *arg);
};

struct MyObject {
    int m_instanceVariable1;
    int m_instanceVariable2;
    struct MyObject_vtable *__vtable;
};

struct MyObject * obj = /* ... construct a MyObject instance */;

// int result = obj->foo();
int result = (*(obj->__vtable.foo))();

// obj->bar("Hello");
(*(obj->__vtable.bar))("Hello");

Also, while this may be a little beyond the scope of the question, it is worth noting that often the compiler can determine the function to be called at compile time, and in such cases, it can call the function directly, without going through the virtual-call machinery. For example:

MyObject obj1;
int result1 = obj1.foo();

MyObject *obj2 = getAMyObject();
int result2 = obj2->foo();

In this case, it is known at compile time which foo() to call for the first call, so it can be called directly. For the second call, it is possible that getAMyObject() returns some object of a class derived from MyObject which has overridden foo() , so the virtual-call mechanism must be used.

Answer 4

It is, actually, a matter of bottleneck...

... but let's first revise your assumptions, with a diagram (64-bits). While the object model is implementation specific, the idea of virtual table as used in the Itanium ABI (gcc, clang, icc, ...) is relatively pervasive in C++.

class Base { public: virtual void foo(); int i; };

+-------+---+---+
| v-ptr | i |pad|
+-------+---+---+

class Derived: public Base { public: virtual void foo(); int j; };

+-------+---+---+
| v-ptr | i | j |
+-------+---+---+

In the case of a single (non-virtual) base class, the v-ptr is the first member of the object. Obtaining the v-ptr is therefore easy. From then, the offset is known (at compile time) and thus this is just some pointer arithmetic followed by a function call through a pointer dereference.

Let's see it live thanks to LLVM:

%class.Base = type { i32 (...)**, i32 }
                     ~~~~~~~~~~^  ^~~
                     v-ptr          i

%class.Derived = type { [12 x i8], i32 }
                        ~~~~~~~~^  ^~~
                        Base         j

define void @_Z3fooR4Base(%class.Base* %b) uwtable {
  %1 = bitcast %class.Base* %b to void (%class.Base*)***
  %2 = load void (%class.Base*)*** %1, align 8
  %3 = load void (%class.Base*)** %2, align 8
  tail call void %3(%class.Base* %b)
  ret void
}

%1 : pointer to v-table (obtained by a bitcast, which is transparent CPU-wise)
%2 : v-table itself
%3 : pointer to Derived::foo (first element of the table)

Answer 5

It's basically two reads (one to get vtable ptr from object instance, and one to get function pointer from vtable) and a function call. The memory is often rather hot and stays in cache, and because there isn't any branching, CPUs can pipeline this extremely well to hide a lot of the expense.

Answer 6

Maybe an example of dynamic polymorphism in C might help illustrate the steps. Say you have these classes in C++:

struct Base {
  int someValue;
  virtual void bar();
  virtual int foo();
  void foobar();
};

struct Derived : Base {
  double someOtherValue;
  virtual void bar();
};

Well, in C, you could implement the same hierarchy this way:

struct Base {
  void** vtable;
  int someValue;
};

void Base_foobar(Base* p);
void Base_bar_impl(Base* p);
int Base_foo_impl(Base* p);

void* Base_vtable[] = {(void*)&Base_bar_impl, (void*)&Base_foo_impl};

void Base_construct(Base* p) {
  p->vtable = Base_vtable;
  p->someValue = 0;
};

void Base_bar(Base* p) {
  (void(*)())(p->vtable[0])();  // this is the virtual dispatch code for "bar".
};

int Base_foo(Base* p) {
  return (int(*)())(p->vtable[1])();  // this is the virtual dispatch code for "foo".
};


struct Derived {
  Base base;
  double someOtherValue;
};

void Derived_bar_impl(Base* p);

void* Derived_vtable[] = {(void*)&Derived_bar_impl, (void*)&Base_foo_impl};

void Derived_construct(Derived* p) {
  Base_construct(&(p->base));
  p->base.vtable = Derived_vtable;  // setup the new vtable as part of derived-class constructor.
  p->someOtherValue = 0.0;
};

Obviously, the syntax is a lot simpler in C++ (duh!), but as you can see, there is nothing complex about dynamic dispatching, just a simple look up in a (static) table of function-pointers with a vtable pointer that is set at construction of the object. Also, nothing in the above is difficult for a compiler to do automatically (ie, a compiler can easily take the C++ code above and generate the corresponding C code below). In the case of multiple inheritance, it is just as easy, each base class has its own vtable pointer, and the derived class must set those pointers for each of its base classes, and that's it, with the only sticky point that you now need to apply a pointer offset when casting up or down the hierarchy (hence the importance of using C++-style casting operators!).

By and large, when serious people discuss the overhead of virtual functions, they are not talking about the "complicated" steps required to do the function call (because that is fairly trivial and sometimes optimized away). They are most likely talking about cache-related problems such as throwing off the pre-fetcher (by hard to predict dispatched calls) and preventing the compiler from packaging functions close to (or even inline to) where they are needed in the final executable (or DLL). These problems are by far the main overhead of virtual functions, and still, those are not that significant, and some compilers are smart enough to mitigate those issues pretty well.

Why C++ virtual call is not much slower than a non-virtual one?

Question

6 answers

solution1
7 2012-12-05 04:41:30

solution2
4 2012-12-05 04:34:31

solution3
4 2012-12-05 04:39:51

solution4
2 2012-12-05 08:19:22

solution5
1 2012-12-05 04:36:04

solution6
0 2012-12-05 05:07:10

Why C++ virtual call is not much slower than a non-virtual one?

Question

6 answers

solution1 7 2012-12-05 04:41:30

solution2 4 2012-12-05 04:34:31

solution3 4 2012-12-05 04:39:51

solution4 2 2012-12-05 08:19:22

solution5 1 2012-12-05 04:36:04

solution6 0 2012-12-05 05:07:10

solution1
7 2012-12-05 04:41:30

solution2
4 2012-12-05 04:34:31

solution3
4 2012-12-05 04:39:51

solution4
2 2012-12-05 08:19:22

solution5
1 2012-12-05 04:36:04

solution6
0 2012-12-05 05:07:10