简体   繁体   English

dynamic_cast的性能

[英]Performance of dynamic_cast

I previous asked a question Why is dynamic_cast evil or not ? 我之前问了一个问题为什么dynamic_cast是邪恶的? The answers made me to write some code about performance of dynamic_cast as follows.And I compiled and tested, the time consumed by dynamic_cast is slightly bigger than that without dynamic_cast .I didn't see the evidence of dynamic_cast is time consuming.Did I write the right code ? 答案让我写了一些关于dynamic_cast性能的代码如下。我编译和测试, dynamic_cast消耗的时间比没有dynamic_cast的时间略大。我没有看到dynamic_cast的证据是耗时的。我写的正确的代码?

The code is : 代码是:

class Animal
{
public:
    virtual ~Animal(){};
};

class Cat : public Animal
{
public:
    std::string param1;
    std::string param2;
    std::string param3;
    std::string param4;
    std::string param5;
    int param6;
    int param7;
};

bool _process(Cat* cat)
{
    cat->param1 = "abcde";
    cat->param2 = "abcde";
    cat->param3 = "abcde";
    cat->param4 = "abcde";
    cat->param5 = "abcde";
    cat->param6 = 1;
    cat->param7 = 2;
    return true;
}

bool process(Animal *ptr)
{
    Cat *cat = dynamic_cast<Cat*>(ptr);
    if (cat == NULL)
    {
        return false;
    } 
    _process(cat);
    return true;
}
int main(int argc, char* argv[])
{
    /*
    argv[1] : object num
    */

    if (argc != 2)
    {
        std::cout << "Error: invalid argc " << std::endl;
        return -1;
    }

    int obj_num = atoi(argv[1]);
    if (obj_num <= 0)
    {
        std::cout << "Error: object num" << std::endl;
    }

    int c = 0;
    for (; c < obj_num; c++)
    {
        Cat cat;
        #ifdef _USE_CAST
        if (!process(&cat))
        {
            std::cout << "Error: failed to process " << std::endl;
            return -3;
        }
        #else
        if (!_process(&cat))
        {
            std::cout << "Error: failed to process " << std::endl;
            return -3;
        }

        #endif
    }

    return 0;
}

compile it using: 使用以下方法编译

g++ -D_USE_CAST -o dynamic_cast_test  dynamic_cast_benchmark.c
g++ -o dynamic_cast_no_test dynamic_cast_benchmark.c

execute them using num, which is 1,10,100 ...: 使用num执行它们,即1,10,100 ...:

$time ./dynamic_cast_test num
$time ./dynamic_cast_no_test num

The result: 结果:

                 dynamic_cast               non_dynamic_cast
num  10,000   
                real    0m0.010s            real    0m0.008s
                user    0m0.006s            user    0m0.006s
                sys     0m0.001s            sys     0m0.001s

     100,000 
                real    0m0.059s            real    0m0.056s
                user    0m0.054s            user    0m0.054s
                sys     0m0.001s            sys     0m0.001s

     1,000,000
                real    0m0.523s            real    0m0.519s
                user    0m0.517s            user    0m0.511s
                sys     0m0.001s            sys     0m0.004s

     10,000,000
                real    0m6.050s            real    0m5.126s
                user    0m5.641s            user    0m4.986s
                sys     0m0.036s            sys     0m0.019s

     100,000,000
                real    0m52.962s           real    0m51.178s
                user    0m51.697s           user    0m50.226s
                sys     0m0.173s            sys     0m0.092s

hardware & os: 硬件和操作系统:

OS:Linux
CPU:Intel(R) Xeon(R) CPU E5607  @ 2.27GHz  (4 cores)

You did write the right code, althought I would not have hard-coded the type to be a Cat. 你确实写了正确的代码,虽然我不会将类型硬编码为Cat。 You can, just to play on the safe side, use the command line argument to decide wether to build a cat or a, say, dog (which you should implement also). 你可以,只是为了安全起见,使用命令行参数来决定是否建造一只猫或狗,(你也应该实施)。 Try disabling optimization also, in order to see if it's playing a significant role. 尝试禁用优化,以查看它是否发挥了重要作用。

Finally, a word of caution is in order. 最后,要谨慎一点。 Profiling is not as simple as taking a measurement on your computer, so you must be aware that what you are doing only takes you so far. 分析并不像在计算机上进行测量那么简单,因此您必须意识到您所做的只是带您到目前为止。 It does give you an idea, don't think you are getting any everything-encompassing answer though. 它确实给了你一个想法,不要以为你得到任何包罗万象的答案。

I will reformulate my post. 我会改编我的帖子。

Your code is correct, and it compiles well. 你的代码是正确的,它编译得很好。

Since virtual methods and dynamic_cast operator are related issues, check this information from wiki, hope it will be useful. 由于虚方法和dynamic_cast运算符是相关问题,请从wiki检查此信息,希望它有用。

wiki: 维基:

A virtual call requires at least an extra indexed dereference, and sometimes a "fixup" addition, compared to a non-virtual call, which is simply a jump to a compiled-in pointer. 与非虚拟调用相比,虚拟调用至少需要额外的索引取消引用,有时需要“修正”添加,这只是跳转到编译指针。 Therefore, calling virtual functions is inherently slower than calling non-virtual functions. 因此,调用虚函数本质上比调用非虚函数慢。 An experiment done in 1996 indicates that approximately 6–13% of execution time is spent simply dispatching to the correct function, though the overhead can be as high as 50%.[4] 1996年进行的一项实验表明,大约6-13%的执行时间花在简单调度到正确的功能上,尽管开销可能高达50%。[4] The cost of virtual functions may not be so high on modern CPU architectures due to much larger caches and better branch prediction. 由于更大的高速缓存和更好的分支预测,虚拟功能的成本在现代CPU架构上可能不会那么高。

Furthermore, in environments where JIT compilation is not in use, virtual function calls usually cannot be inlined. 此外,在未使用JIT编译的环境中,通常无法内联虚函数调用。 While a compiler could replace the lookup and indirect call with, for instance, a conditional execution of each inlined body, such optimizations are not common. 虽然编译器可以用例如每个内联体的条件执行来替换查找和间接调用,但这种优化并不常见。

To avoid this overhead, compilers usually avoid using vtables whenever the call can be resolved at compile time. 为了避免这种开销,编译器通常会避免在编译时解析调用时使用vtable。

Thus, the call to f1 above may not require a vtable lookup because the compiler may be able to tell that d can only hold a D at this point, and D does not override f1. 因此,对上面的f1的调用可能不需要vtable查找,因为编译器可能能够告诉d此时只能保持D,并且D不会覆盖f1。 Or the compiler (or optimizer) may be able to detect that there are no subclasses of B1 anywhere in the program that override f1. 或者编译器(或优化器)可能能够检测到程序中任何地方都没有B1的子类来覆盖f1。 The call to B1::f1 or B2::f2 will probably not require a vtable lookup because the implementation is specified explicitly (although it does still require the 'this'-pointer fixup). 对B1 :: f1或B2 :: f2的调用可能不需要vtable查找,因为实现是明确指定的(尽管它仍然需要'this'指针修复)。

Also, as you probably know, when you declare a virtual method in your class, depends on realization but almost always, your compiler will implicitly add virtual method table as new memeber to your class, thus each instance of this class will occupy more memory space, try sizeof on class with vm and without it. 另外,正如您可能知道的,当您在类中声明虚方法时,依赖于实现但几乎总是如此,您的编译器将隐式地将虚方法表添加为您的类的新方法,因此该类的每​​个实例将占用更多的内存空间,在没有它的情况下在类上尝试sizeof。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM