使用虚函数与static_cast从base到派生

Question

I am trying to understand which implementation below is "faster". 我试图了解下面哪个实现“更快”。 Assume that one compiles this code with and without the -DVIRTUAL flag. 假设使用和不使用-DVIRTUAL标志编译此代码。

I assume that compiling without -DVIRTUAL will be faster because: 我假设没有-DVIRTUAL的编译会更快，因为：

a] There is no vtable used a]没有使用vtable

b] The compiler might be able to optimize the assembly instructions because it "knows" exactly which call will be made given the various options (there are only a finite number of options). b]编译器可能能够优化汇编指令，因为它“确切地”知道在给定各种选项的情况下将进行哪个调用（只有有限数量的选项）。

My question is PURELY related to speed, not pretty code. 我的问题是PURELY与速度有关，而不是漂亮的代码。

a] Am I correct in my analysis above? a]我在上面的分析中是否正确？

b] Will the branch predictor / compiler combination be intelligent enough to optimize for a given branch of the switch statement? b]分支预测器/编译器组合是否足够智能以优化switch语句的给定分支？ See that the "type" is a const int. 看到“type”是一个const int。

c] Are there any other factors that I am missing? c]我还缺少其他因素吗？

Thanks! 谢谢！

#include <iostream>

class Base
{
public:
    Base(int t) : type(t) {}
    ~Base() {}

   const int type;
#ifdef VIRTUAL
    virtual void fn1()=0;
#else
    void fn2();
#endif
};

class Derived1 : public Base
{
public:
    Derived1() : Base(1) { }
    ~Derived1() {}
    void fn1() { std::cout << "in Derived1()" << std::endl; }
};

class Derived2 : public  Base
{
public:
    Derived2() : Base(2) {  }
    ~Derived2() { }
    void fn1() { std::cout << "in Derived2()" << std::endl; }
};


#ifndef VIRTUAL
    void Base::fn2()
    {
        switch(type)
        {
        case 1:
            (static_cast<Derived1* const>(this))->fn1();
            break;
        case 2:
            (static_cast<Derived2* const>(this))->fn1();
            break;
        default:
            break;
        };
    }
#endif


int main()
{
    Base *test = new Derived1();
#ifdef VIRTUAL
    test->fn1();
#else
test->fn2();
#endif
    return 0;
}

Answer 1

I think you misunderstand the VTable. 我想你误解了VTable。 The VTable is simply a jump table (In most implementations though AFAIK the spec does not guarantee this!). VTable只是一个跳转表（在大多数实现中，尽管AFAIK规范并不能保证这一点！）。 In fact you could go as far as saying its a giant switch statement. 事实上，你可以说它是一个巨大的转换声明。 As such I'd wager the speed would be exactly the same with both your methods. 因此，我敢打赌速度与你的两种方法完全相同。

If anything I'd imagine the VTable method would be slightly faster as the compiler can make better decisions to optimise for cache alignment and so forth... 如果有什么我认为VTable方法会稍快一点，因为编译器可以做出更好的决策来优化缓存对齐等等......

Answer 2

Have you measured the performance to see if there's even any difference at all? 你有没有测量过表现，看看是否有任何差别？

I suppose not, because then you wouldn't be asking here. 我想不是，因为那样你就不会在这里问。 It's the only reasonable response though. 这是唯一合理的回应。

Answer 3

It's impossible to answer without specifying compiler and compiler options. 如果不指定编译器和编译器选项，则无法回答。

I see no particular reason why your non-virtual code should necessarily be any faster to make the call than the virtual code. 我认为没有什么特别的理由说明为什么非虚拟代码必须比虚拟代码更快地进行调用。 In fact, the switch might well be slower than a vtable, since a call using a vtable will load an address and jump to it, whereas the switch will load an integer and do a little bit of thinking. 事实上，交换机可能比vtable慢，因为使用vtable的调用将加载一个地址并跳转到它，而交换机将加载一个整数并做一些思考。 Either one of them could be faster. 他们中的任何一个都可以更快。 For obvious reasons, a virtual call is not specified by the standard to be "slower than any other thing you invent to replace it". 由于显而易见的原因，标准未指定虚拟调用“比您发明的任何其他更换它的速度慢”。

I think it's reasonably unlikely that a randomly-chosen compiler will actually inline the call in the virtual case, but it's certainly allowed to (under the as-if rule), since the dynamic type of *test could be determined by data-flow analysis or similar. 我认为随机选择的编译器在虚拟情况下实际内联调用是不太可能的，但它肯定允许（在as-if规则下），因为*test的动态类型可以通过数据流分析来确定或类似的。 I think it's reasonably likely that with optimization enabled a randomly-chosen compiler will inline everything in the non-virtual case. 我认为，通过优化启用，随机选择的编译器可以内联非虚拟情况下的所有内容。 But then, you've given a small example with very short functions all in one TU, so inlining is especially easy. 但是，你在一个TU中给出了一个功能很短的小例子，所以内联特别容易。

Answer 4

假设您没有过早地进行微观优化，并且您已经分析了代码并发现这是一个需要解决的问题，那么找出问题答案的最佳方法是在发布中进行完全优化并检查生成的机器代码。

Answer 5

避免vtable的速度不一定是真的 - 确定，你应该衡量自己。

Answer 6

Note that: 注意：

The static_cast version may introduce a branch (likely not to, if it gets optimized to a jump table), static_cast版本可能会引入一个分支（如果它被优化为跳转表，则可能不会）
The vtable version on all implementations I know will result in a jump table. 我知道的所有实现上的vtable版本都会产生一个跳转表。

See a pattern here? 看到这里的模式？

Generally, you'd prefer linear time lookup, not branching the code, so the virtual function method seems to be better. 通常，您更喜欢线性时间查找，而不是分支代码，因此虚函数方法似乎更好。

Answer 7

It depends on the platform and the compiler. 这取决于平台和编译器。 A switch statement can be implemented as a test and branch or a jump table (ie, an indirect branch). switch语句可以实现为测试和分支或跳转表（即间接分支）。 A virtual function is usually implemented as an indirect branch. virtual函数通常实现为间接分支。 If your compiler turns the switch statement into a jump table, the two approaches differ by one additional dereference. 如果编译器将switch语句转换为跳转表，则这两种方法会有一个额外的解引用。 If that is the case and this particular usage happens infrequently enough (or thrashes the cache enough) then you might see a difference due to an extra cache miss. 如果是这种情况并且这种特殊用法不经常发生（或者足够多地使缓存崩溃），那么由于额外的缓存未命中，您可能会看到差异。

On the other hand, if the switch statement is simply a test and branch, you might see a much bigger performance difference on some in-order CPUs that flush the instruction cache on an indirect branch (or require a high latency between setting the destination of an indirect branch and jumping to it). 另一方面，如果switch语句只是一个测试和分支，您可能会看到在某些有序CPU上的更大性能差异，这些CPU在间接分支上刷新指令缓存（或者在设置目标之间需要高延迟）间接分支并跳转到它）。

If you are really concerned with the overhead of virtual function dispatch, say, for an inner loop over a heterogenous collection of objects, you might want to reconsider where you perform the dynamic dispatch. 如果您真的关心虚函数调度的开销，比如说，对于异构对象集合的内循环，您可能需要重新考虑执行动态调度的位置。 It doesn't have to be per object; 它不一定是每个对象; it could also be per known groupings of objects with the same type. 它也可以是每个已知的具有相同类型的对象分组。

使用虚函数与static_cast从base到派生

问题描述

7 个解决方案

解决方案1
1 2010-11-29 17:22:16

解决方案2
1 2010-11-29 17:23:15

解决方案3
1 2010-11-29 17:23:27

解决方案4
1 2010-11-29 17:24:45

解决方案5
0 2010-11-29 17:23:06

解决方案6
0 2010-11-29 17:32:02

解决方案7
0 已采纳 2010-11-29 17:42:59

使用虚函数与​​static_cast从base到派生

问题描述

7 个解决方案

解决方案1 1 2010-11-29 17:22:16

解决方案2 1 2010-11-29 17:23:15

解决方案3 1 2010-11-29 17:23:27

解决方案4 1 2010-11-29 17:24:45

解决方案5 0 2010-11-29 17:23:06

解决方案6 0 2010-11-29 17:32:02

解决方案7 0 已采纳 2010-11-29 17:42:59

使用虚函数与static_cast从base到派生

解决方案1
1 2010-11-29 17:22:16

解决方案2
1 2010-11-29 17:23:15

解决方案3
1 2010-11-29 17:23:27

解决方案4
1 2010-11-29 17:24:45

解决方案5
0 2010-11-29 17:23:06

解决方案6
0 2010-11-29 17:32:02

解决方案7
0 已采纳 2010-11-29 17:42:59