我错过了什么或者虚拟电话的表现并不像人们那样糟糕

Question

I have been developing a simple framework for embedded environments. 我一直在为嵌入式环境开发一个简单的框架。 I came to a design decision on whether to use virtual calls, CRTP, or maybe a switch statement. 我决定是否使用虚拟呼叫，CRTP或switch语句。 I have been told that vtables perform poorly in embedded. 有人告诉我，vtables在嵌入式中表现不佳。
Following up from this question vftable performance penalty vs. switch statement I decided to run my own test. 从这个问题跟进vftable性能损失与switch语句我决定运行我自己的测试。 I ran three different ways to call a member function. 我用三种不同的方式来调用成员函数。

using the etl library's etl::function, a library meant to mimic the stl library but for embedded environments.(no dynamic allocations). 使用etl库的etl :: function，一个库意味着模仿stl库，但是对于嵌入式环境。（没有动态分配）。
using a master switch statement that will call an object's based on an object's int ID 使用master switch语句，它将根据对象的int ID调用对象
using a pure virtual call to a base class 使用对基类的纯虚拟调用

I never tried this with a basic CRTP pattern but the etl::function was supposed to be a variation on that where that was the mechanism used for the pattern. 我从来没有用基本的CRTP模式尝试过这个，但是etl :: function应该是一个变体，那就是用于模式的机制。 The time I got on MSVC and similar performance on an ARM Cortex M4 was 我上MSVC的时间和ARM Cortex M4上的类似性能是

etl : 400 million nanoseconds etl：4亿纳秒
switch : 420 million nanoseconds 开关：4.2亿纳秒
virtual: 290 million nanoseconds 虚拟：2.9亿纳秒

The pure virtual calls are significantly faster. 纯虚拟调用明显更快。 Am I missing something or are virtual calls just not as bad as people make them out to be. 我错过了什么或虚拟电话只是没有人们做出来的那么糟糕。 Here is the code used for the tests. 这是用于测试的代码。

 class testetlFunc
{
public:
    uint32_t a;

    testetlFunc() { a = 0; };

    void foo();
};

class testetlFunc2
{
public:
    uint32_t a;

    testetlFunc2() { a = 0; };

    virtual void foo() = 0;
};

void testetlFunc::foo()
{
    a++; 
}


class testetlFuncDerived : public testetlFunc2
{
public:
    testetlFuncDerived(); 

    void foo() override;
};

testetlFuncDerived::testetlFuncDerived()
{ 
}

void testetlFuncDerived::foo()
{
    a++; 
}


etl::ifunction<void>* timer1_callback1;
etl::ifunction<void>* timer1_callback2;
etl::ifunction<void>* timer1_callback3;
etl::ifunction<void>* timer1_callback4;
etl::ifunction<void>* etlcallbacks[4];

testetlFunc ttt;
testetlFunc ttt2;
testetlFunc ttt3;
testetlFunc ttt4;
testetlFuncDerived tttd1;
testetlFuncDerived tttd2;
testetlFuncDerived tttd3;
testetlFuncDerived tttd4;
testetlFunc2* tttarr[4];

static void MasterCallingFunction(uint16_t ID) {
    switch (ID)
    {
    case 1:
        ttt.foo();
        break;
    case 2:
        ttt2.foo();
        break;
    case 3:
        ttt3.foo();
        break;
    case 4:
        ttt4.foo();
        break;
    default:
        break;
    }
};






int main()
{

    tttarr[0] = (testetlFunc2*)&tttd1;
    tttarr[1] = (testetlFunc2*)&tttd2;
    tttarr[2] = (testetlFunc2*)&tttd3;
    tttarr[3] = (testetlFunc2*)&tttd4;

    etl::function_imv<testetlFunc, ttt, &testetlFunc::foo> k;
    timer1_callback1 = &k;
    etl::function_imv<testetlFunc, ttt2, &testetlFunc::foo> k2;
    timer1_callback2 = &k2;
    etl::function_imv<testetlFunc, ttt3, &testetlFunc::foo> k3;
    timer1_callback3 = &k3;
    etl::function_imv<testetlFunc, ttt4, &testetlFunc::foo> k4;
    timer1_callback4 = &k4;
etlcallbacks[0] = timer1_callback1;
    etlcallbacks[1] = timer1_callback2;
    etlcallbacks[2] = timer1_callback3;
    etlcallbacks[3] = timer1_callback4;

    //results for etl::function --------------
    int rng;
    srand(time(0));
    StartTimer(1)
    for (uint32_t i = 0; i < 2000000; i++)
    {
        rng = rand() % 4 + 0;
        for (uint16_t j= 0; j < 4; j++)
        {
            (*etlcallbacks[rng])();
        }
    }
    StopTimer(1)

    //results for switch --------------
    StartTimer(2)
    for (uint32_t i = 0; i < 2000000; i++)
    {
        rng = rand() % 4 + 0;
        for (uint16_t j = 0; j < 4; j++)
        {
            MasterCallingFunction(rng);
        }
    }
    StopTimer(2)
        //results for virtual vtable --------------
        StartTimer(3)
        for (uint32_t i = 0; i < 2000000; i++)
        {
            rng = rand() % 4 + 0;
            for (uint16_t j = 0; j < 4; j++)
            {
                tttarr[rng]->foo();
                //ttt.foo();
            }
        }
    StopTimer(3)
PrintAllTimerDuration
}

Answer 1

If what you really need is virtual dispatch, C++'s virtual calls are probably the most performant implementation you can get, and you should use them. 如果您真正需要的是虚拟调度，C ++的虚拟调用可能是您可以获得的最高性能的实现，您应该使用它们。 Scores of compiler engineers have worked on optimizing them to the best performance they could get. 许多编译工程师一直致力于优化它们以获得最佳性能。

The reason behind people saying to avoid virtual methods is in my experience for when you do not need them. 人们说避免使用虚拟方法的原因在于我不需要它们的经验。 Avoid the virtual keyword on methods that can be statically dispatched, and on hot spots in your code. 避免在可以静态分派的方法上以及代码中的热点上使用虚拟关键字。

Every time you call an object's virtual method, what happens is that the object's v-table is accessed (likely screwing up memory locality and flushing a cache or two), then a pointer is de-referenced to get at the actual function address, and then the actual function call happens. 每次调用对象的虚方法时，会发生对象的v表被访问（可能搞乱内存局部性并刷新缓存或两个缓存），然后指针被取消引用以获取实际的函数地址，并且然后发生实际的函数调用。 This is only fractions of a second slower, but if you're fractions slower enough times in a loop, it suddenly makes a difference. 这只是慢一点的一小部分，但如果你在一个循环中的分数足够慢，它会突然产生影响。

When you call a static method, none of the earlier operations happen. 当您调用静态方法时，不会发生任何早期操作。 The actual function call just happens. 实际的函数调用刚刚发生。 If the function that calls and the one that is called are close to each other in memory, all caches can stay the way they are. 如果调用的函数和被调用的函数在内存中彼此接近，则所有缓存都可以保持原样。

So, avoid virtual dispatch in high-performance or low-CPU-power situations in tight loops (you can for example switch on a member variable and call a method that contains the entire loop instead). 因此，避免在紧密循环中的高性能或低CPU功率情况下进行虚拟调度（例如，您可以打开成员变量并调用包含整个循环的方法）。

But there is the saying "premature optimization is the root of all evil". 但有一种说法是“过早优化是所有邪恶的根源”。 Measure performance beforehand. 事先衡量表现。 "Embedded" CPUs have become much faster and more powerful than those a few years ago. “嵌入式”CPU比几年前更快，更强大。 Compilers for popular CPUs are better optimized than ones only just adapted to a new or exotic CPU. 流行CPU的编译器比仅适用于新的或异乎寻常的CPU的编译器更优化。 It may simply be that your compiler has an optimizer that alleviates any problems, or that your CPU is similar enough to a common desktop CPU to reap the benefits of work done for more popular CPUs. 可能只是因为您的编译器具有可以缓解任何问题的优化器，或者您的CPU与普通桌面CPU足够相似，可以获得为更流行的CPU所做的工作带来的好处。

Or you may have more RAM etc. than the people who told you to avoid virtual calls. 或者你可能有更多的RAM等，而不是告诉你避免虚拟通话的人。

So, profile, and if the profiler says it's fine, it's fine. 所以，简介，如果分析师说它很好，那很好。 Also make sure your tests are representative. 还要确保您的测试具有代表性。 Your test code may just be written in a way that a network request coming in pre-empted the switch statement and made it seem slower than it really was, or that the virtual method calls were benefiting from the cache loaded by the non-virtual calls. 您的测试代码可能只是以一种网络请求进入抢先切换语句并使其看起来比实际速度慢的方式编写，或者虚拟方法调用受益于非虚拟调用加载的缓存。

我错过了什么或者虚拟电话的表现并不像人们那样糟糕

问题描述

1 个解决方案

解决方案1
8 已采纳 2019-07-21 17:02:34

我错过了什么或者虚拟电话的表现并不像人们那样糟糕

问题描述

1 个解决方案

解决方案1 8 已采纳 2019-07-21 17:02:34

解决方案1
8 已采纳 2019-07-21 17:02:34