简体   繁体   English

C ++虚拟函数与成员函数指针(性能比较)

[英]c++ virtual function vs member function pointer (performance comparison)

Virtual function calls can be slow due to virtual calls requiring an extra indexed deference to the v-table, which can result in a data cache miss as well as an instruction cache miss... Not good for performance critical applications. 由于虚拟调用需要对v表进行额外的索引引用,因此虚拟函数调用的速度可能会很慢,这可能导致数据高速缓存未命中以及指令高速缓存未命中。

So I have been thinking of a way to overcome this performance issue of virtual functions yet still having some of the same functionality that virtual functions provide. 因此,我一直在思考一种克服虚拟功能性能问题的方法,但仍具有虚拟功能所提供的某些功能。

I am confident that this has been done before, but I devised a simple test that allows the base class to store a member function pointer that can be set by any the derived class. 我相信以前已经做过,但是我设计了一个简单的测试,该测试允许基类存储可由任何派生类设置的成员函数指针。 And when I call Foo() on any derived class, it will call the appropriate member function without having to traverse the v-table... 当我在任何派生类上调用Foo()时,它将调用适当的成员函数,而无需遍历v表...

I am just wondering if this method is a viable replacement for the virtual-call paradigm, if so, why is it not more ubiquitous? 我只是想知道这种方法是否可以替代虚拟呼叫范例,如果可以,为什么它不那么普遍?

Thanks in advance for your time! 在此先感谢您的时间! :) :)

class BaseClass
{
protected:

    // member function pointer
    typedef void(BaseClass::*FooMemFuncPtr)();
    FooMemFuncPtr m_memfn_ptr_Foo;

    void FooBaseClass() 
    {
        printf("FooBaseClass() \n");
    }

public:

    BaseClass()
    {
        m_memfn_ptr_Foo = &BaseClass::FooBaseClass;
    }

    void Foo()
    {
        ((*this).*m_memfn_ptr_Foo)();
    }
};

class DerivedClass : public BaseClass
{
protected:

    void FooDeriveddClass()
    {
        printf("FooDeriveddClass() \n");
    }

public:

    DerivedClass() : BaseClass()
    {
        m_memfn_ptr_Foo = (FooMemFuncPtr)&DerivedClass::FooDeriveddClass;
    }
};

int main(int argc, _TCHAR* argv[])
{
    DerivedClass derived_inst;
    derived_inst.Foo(); // "FooDeriveddClass()"

    BaseClass base_inst;
    base_inst.Foo(); // "FooBaseClass()"

    BaseClass * derived_heap_inst = new DerivedClass;
    derived_heap_inst->Foo();

    return 0;
}

I did a test, and the version using virtual function calls was faster on my system with optimization. 我做了一个测试,使用虚拟函数调用的版本在经过优化的系统上更快。

$ time ./main 1
Using member pointer

real    0m3.343s
user    0m3.340s
sys     0m0.002s

$ time ./main 2
Using virtual function call

real    0m2.227s
user    0m2.219s
sys     0m0.006s

Here is the code: 这是代码:

#include <cstdlib>
#include <cstring>
#include <iostream>
#include <stdio.h>

struct BaseClass
{
    typedef void(BaseClass::*FooMemFuncPtr)();
    FooMemFuncPtr m_memfn_ptr_Foo;

    void FooBaseClass() { }

    BaseClass()
    {
        m_memfn_ptr_Foo = &BaseClass::FooBaseClass;
    }

    void Foo()
    {
        ((*this).*m_memfn_ptr_Foo)();
    }
};

struct DerivedClass : public BaseClass
{
    void FooDerivedClass() { }

    DerivedClass() : BaseClass()
    {
        m_memfn_ptr_Foo = (FooMemFuncPtr)&DerivedClass::FooDerivedClass;
    }
};

struct VBaseClass {
  virtual void Foo() = 0;
};

struct VDerivedClass : VBaseClass {
  virtual void Foo() { }
};

static const size_t count = 1000000000;

static void f1(BaseClass* bp)
{
  for (size_t i=0; i!=count; ++i) {
    bp->Foo();
  }
}

static void f2(VBaseClass* bp)
{
  for (size_t i=0; i!=count; ++i) {
    bp->Foo();
  }
}

int main(int argc, char** argv)
{
    int test = atoi(argv[1]);
    switch (test) {
        case 1:
        {
            std::cerr << "Using member pointer\n";
            DerivedClass d;
            f1(&d);
            break;
        }
        case 2:
        {
            std::cerr << "Using virtual function call\n";
            VDerivedClass d;
            f2(&d);
            break;
        }
    }

    return 0;
}

Compiled using: 编译使用:

g++ -O2    main.cpp   -o main

with g++ 4.7.2. 使用g ++ 4.7.2。

Virtual function calls can be slow due to virtual calls having to traverse the v-table, 由于虚拟调用必须遍历v表,因此虚拟函数调用可能会变慢,

That's not quite correct. 那不是很正确。 The vtable should be computed on object construction, with each virtual function pointer set to the most specialized version in the hierarchy. vtable应该在对象构造时计算,每个虚拟函数指针设置为层次结构中最专门的版本。 The process of calling a virtual function does not iterate pointers but call something like *(vtbl_address + 8)(args); 调用虚拟函数的过程不会迭代指针,而是调用*(vtbl_address + 8)(args); , which is computed in constant time. ,以固定时间计算。

which can result in a data cache miss as well as an instruction cache miss... Not good for performance critical applications. 这可能会导致数据高速缓存未命中以及指令高速缓存未命中。

Your solution is not good for performance critical applications (in general) either, because it is generic. 您的解决方案也不适合(通常)对性能至关重要的应用程序,因为它是通用的。

As a rule, performance critical applications are optimized on a per-case basis (measure, pick code with worst performance problems within module and optimize). 通常,性能关键型应用程序会根据具体情况进行优化(测量,选择模块中性能最差的代码并进行优化)。

With this per-case approach, you will probably never have a case where your code is slow because the compiler has to traverse a vtbl. 使用这种逐例方法,您可能永远不会遇到代码慢的情况,因为编译器必须遍历vtbl。 If that is the case, the slowness would probably come from calling functions through pointers instead of directly (ie the problem would be solved by inlining, not by adding an extra pointer in the base class). 如果是这种情况,那么这种慢速可能是由于通过指针而不是直接通过指针调用函数引起的(即,可以通过内联而不是通过在基类中添加额外的指针来解决问题)。

All this is academic anyway, until you have a concrete case to optimize (and you have measured that your worst offender is virtual function calls). 无论如何,所有这些都是学术性的,直到您有一个具体的案例需要优化(并且您已经测量出最严重的违规者是虚拟函数调用)。

Edit : 编辑

I am just wondering if this method is a viable replacement for the virtual-call paradigm, if so, why is it not more ubiquitous? 我只是想知道这种方法是否可以替代虚拟呼叫范例,如果可以,为什么它不那么普遍?

Because it looks like a generic solution (applying it ubiquitously would decrease performance instead of improving it), solving a non-existent problem (your application is generally not slowed down due to virtual function calls). 因为它看起来像一个通用的解决方案(普遍应用会降低性能而不是提高性能),所以解决了一个不存在的问题(由于虚拟函数调用,您的应用程序通常不会减慢速度)。

Virtual functions do not "traverse" the table, just do a single fetch of a pointer from a location and call that address. 虚函数不会“遍历”表,而只是从某个位置一次获取指针并调用该地址。 That as if you had a manual implementation of a pointer-to-funciton and used that for a call instead of a direct one. 就像您已经手动实现指向功能的指针,并将其用于呼叫而不是直接呼叫一样。

So your work is only good for obfuscation, and sabotage the cases where the compiler can issue nonvirtual direct call. 因此,您的工作仅适用于混淆,并且会破坏编译器可以发出非虚拟直接调用的情况。

Using a pointer-to-memberfunction is probably even worse than PTF, it will likely use the same VMT structure for an similar offseted access, just a variable one instead of fixed. 使用指针到成员的功能可能比PTF还要糟糕,它可能对相同的偏移访问使用相同的VMT结构,只是一个变量而不是固定变量。

Mostly because it doesn't work. 主要是因为它不起作用。 Most modern CPUs are better at branch prediction and speculative execution than you think. 大多数现代CPU在分支预测和推测执行方面都比您想象的要好。 However I have yet to see a CPU that do speculative execution beyond a non-static branch. 但是我还没有看到CPU在非静态分支之外执行推测性执行。

Furthermore in a modern CPU you are more likely to have a cache miss because you had a context switch just prior to the call and another program took over the cache than you are because of a v-table, even this scenario is a very remote possiblity. 此外,在现代CPU中,由于调用之前上下文切换和另一个程序接管了高速缓存,因此您更有可能发生高速缓存未命中,而不是因为使用了v-table,即使这种情况也是非常远程的。

Actually some compilers may use thunks , which translate to ordinary function pointers themselves, so basically the compiler does for you what you are trying to do manually (and probably confuse the hell out of people). 实际上,某些编译器可能会使用thunkthunk本身会转换为普通的函数指针,因此,基本上,编译器会为您完成您要手动执行的操作(并可能使人陷入困境)。

Also, having a pointer to virtual function table, the space complexity of virtual function is O(1) (just the pointer). 同样,具有指向虚拟函数表的指针,虚拟函数的空间复杂度为O(1)(仅指针)。 On the other hand, if you store function pointers within the class, then the complexity is O(N) (your class now contains as many pointers as there are "virtual" functions). 另一方面,如果在类中存储函数指针,则复杂度为O(N)(您的类现在包含的指针与“虚拟”函数一样多)。 If there are many functions, you are paying toll for that - when pre-fetching your object, you are loading all the pointers in the cache line, instead of just a single pointer and the first few members which you are likely to need. 如果有很多功能,那么您将为此付出代价-在预取对象时,您将所有指针都加载到缓存行中,而不只是单个指针和可能需要的前几个成员。 That sounds like a waste. 这听起来很浪费。

The virtual function table, on the other hand, sits in one place for all the objects of one type and is likely never pushed out of the cache while your code calls some short virtual functions in a loop (which is presumably the problem when virtual function cost would become the bottleneck). 另一方面,虚拟函数表位于一种类型的所有对象的一个​​位置,并且在代码循环调用某些短虚拟函数时,可能永远不会将其推出缓存(这可能是虚拟函数存在的问题)成本将成为瓶颈)。

As to the branch prediction, in some cases a simple decision tree over object type and inlined functions for each particular type give good performance (then you store type information instead of a pointer). 对于分支预测,在某些情况下,针对对象类型的简单决策树和每种特定类型的内联函数可提供良好的性能(然后,您存储类型信息而不是指针)。 This is not applicable to all types of problems and would be mostly a premature optimization. 这不适用于所有类型的问题,并且大多数情况下是过早的优化。

As a rule of thumb, don't worry about the language constructs because they seem unfamiliar. 根据经验,不要担心语言结构,因为它们似乎并不熟悉。 Worry and optimize only after you have measured and identified where the bottleneck really is. 仅在测量并确定瓶颈的真正位置之后,才担心和优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM