简体   繁体   English

dynamic_cast 的性能?

[英]Performance of dynamic_cast?

Before reading the question:在阅读问题之前:
This question is not about how useful it is to use dynamic_cast .这个问题不是关于使用dynamic_cast有多大用处。 Its just about its performance.它只是关于它的性能。

I've recently developed a design where dynamic_cast is used a lot.我最近开发了一个经常使用dynamic_cast的设计。
When discussing it with co-workers almost everyone says that dynamic_cast shouldn't be used because of its bad performance (these are co-workers which have different backgrounds and in some cases do not know each other. I'm working in a huge company)在与同事讨论时,几乎每个人都说不应该使用dynamic_cast因为它的性能很差(这些同事的背景不同,在某些情况下彼此不认识。我在一家大公司工作)

I decided to test the performance of this method instead of just believing them.我决定测试这种方法的性能,而不是仅仅相信它们。

The following code was used:使用了以下代码:

ptime firstValue( microsec_clock::local_time() );

ChildObject* castedObject = dynamic_cast<ChildObject*>(parentObject);

ptime secondValue( microsec_clock::local_time() );
time_duration diff = secondValue - firstValue;
std::cout << "Cast1 lasts:\t" << diff.fractional_seconds() << " microsec" << std::endl;

The above code uses methods from boost::date_time on Linux to get usable values.上面的代码使用 Linux 上boost::date_time方法来获取可用值。
I've done 3 dynamic_cast in one execution, the code for measuring them is the same.我在一次执行中完成了 3 个dynamic_cast ,用于测量它们的代码是相同的。

The results of 1 execution were the following: 1次执行的结果如下:
Cast1 lasts: 74 microsec Cast1 持续时间:74 微秒
Cast2 lasts: 2 microsec Cast2 持续时间:2 微秒
Cast3 lasts: 1 microsec Cast3 持续时间:1 微秒

The first cast always took 74-111 microsec, the following casts in the same execution took 1-3 microsec.第一次转换总是需要 74-111 微秒,相同执行中的后续转换需要 1-3 微秒。

So finally my questions:所以最后我的问题:
Is dynamic_cast really performing bad? dynamic_cast真的表现不佳吗?
According to the testresults its not.根据测试结果它不是。 Is my testcode correct?我的测试代码是否正确?
Why do so much developers think that it is slow if it isn't?为什么这么多开发人员认为它不是很慢?

Firstly, you need to measure the performance over a lot more than just a few iterations, as your results will be dominated by the resolution of the timer.首先,您需要通过多次迭代来衡量性能,因为您的结果将由计时器的分辨率决定。 Try eg 1 million+, in order to build up a representative picture.尝试例如 100 万+,以构建具有代表性的图片。 Also, this result is meaningless unless you compare it against something, ie doing the equivalent but without the dynamic casting.此外,除非您将其与某些内容进行比较,否则此结果毫无意义,即进行等效但没有动态转换。

Secondly, you need to ensure the compiler isn't giving you false results by optimising away multiple dynamic casts on the same pointer (so use a loop, but use a different input pointer each time).其次,您需要通过优化同一个指针上的多个动态强制转换来确保编译器不会给您错误的结果(因此使用循环,但每次使用不同的输入指针)。

Dynamic casting will be slower, because it needs to access the RTTI (run-time type information) table for the object, and check that the cast is valid.动态转换会更慢,因为它需要访问对象的 RTTI(运行时类型信息)表,并检查转换是否有效。 Then, in order to use it properly, you will need to add error-handling code that checks whether the returned pointer is NULL .然后,为了正确使用它,您需要添加错误处理代码来检查返回的指针是否为NULL All of this takes up cycles.所有这些都需要循环。

I know you didn't want to talk about this, but "a design where dynamic_cast is used a lot" is probably an indicator that you're doing something wrong...我知道你不想谈论这个,但是“一个经常使用 dynamic_cast 的设计”可能表明你做错了什么......

Performance is meaningless without comparing equivalent functionality.如果不比较等效的功能,性能就毫无意义。 Most people say dynamic_cast is slow without comparing to equivalent behavior.大多数人说 dynamic_cast 与等效行为相比很慢。 Call them out on this.把他们叫出来。 Put another way:换一种方式:

If 'works' isn't a requirement, I can write code that fails faster than yours.如果“有效”不是必需的,我可以编写比您更快失败的代码。

There are various ways to implement dynamic_cast, and some are faster than others.实现 dynamic_cast 的方法有很多种,有些方法比其他方法快。 Stroustrup published a paper about using primes to improve dynamic_cast , for example.例如,Stroustrup 发表了一篇关于使用素数改进 dynamic_cast的论文。 Unfortunately it's unusual to control how your compiler implements the cast, but if performance really matters to you, then you do have control over which compiler you use.不幸的是,控制编译器如何实现转换是不寻常的,但是如果性能对您来说真的很重要,那么您确实可以控制使用哪个编译器。

However, not using dynamic_cast will always be faster than using it — but if you don't actually need dynamic_cast, then don't use it!然而,不使用dynamic_cast总是比使用它快——但如果你实际上不需要 dynamic_cast,那就不要使用它! If you do need dynamic lookup, then there will be some overhead, and you can then compare various strategies.如果你确实需要动态查找,那么会有一些开销,然后你可以比较各种策略。

Here are a few benchmarks:以下是一些基准:
http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/ http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/
http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.htmlhttp://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.html

According to them, dynamic_cast is 5-30 times slower than reinterpret_cast, and the best alternative performs almost the same as reinterpret_cast.根据他们的说法,dynamic_cast 比 reinterpret_cast 慢 5-30 倍,最佳替代方案的性能几乎与 reinterpret_cast 相同。

I'll quote the conclusion from the first article:我引用第一篇文章的结论:

  • dynamic_cast is slow for anything but casting to the base type; dynamic_cast 除了转换为基本类型外,其他任何东西都很慢; that particular cast is optimized out那个特定的演员被优化了
  • the inheritance level has a big impact on dynamic_cast继承级别对 dynamic_cast 有很大影响
  • member variable + reinterpret_cast is the fastest reliable way to成员变量 + reinterpret_cast 是最快可靠的方法
    determine type;确定类型; however, that has a lot higher maintenance overhead然而,这有更高的维护开销
    when coding编码时

Absolute numbers are on the order of 100 ns for a single cast.单次转换的绝对数字约为 100 ns。 Values like 74 msec doesn't seem close to reality.像 74 毫秒这样的值似乎不太接近现实。

Sorry to say this, but your test is virtually useless for determining whether the cast is slow or not.很抱歉这么说,但是您的测试对于确定演员是否缓慢几乎毫无用处。 Microsecond resolution is nowhere near good enough.微秒分辨率远远不够好。 We're talking about an operation that, even in the worst case scenario, shouldn't take more than, say, 100 clock ticks, or less than 50 nanoseconds on a typical PC.我们讨论的是一种操作,即使在最坏的情况下,在典型的 PC 上也不应该花费超过 100 个时钟滴答或少于 50 纳秒。

There's no doubt that the dynamic cast will be slower than a static cast or a reinterpret cast, because, on the assembly level, the latter two will amount to an assignment (really fast, order of 1 clock tick), and the dynamic cast requires the code to go and inspect the object to determine its real type.毫无疑问,动态转换会比静态转换或重新解释转换慢,因为,在装配级别,后两者相当于赋值(非常快,1 个时钟周期的顺序),而动态转换需要用于检查对象以确定其真实类型的代码。

I can't say off-hand how slow it really is, that would probably vary from compiler to compiler, I'd need to see the assembly code generated for that line of code.我不能直接说它到底有多慢,这可能因编译器而异,我需要查看为该行代码生成的汇编代码。 But, like I said, 50 nanoseconds per call is the upper limit of what expect to be reasonable.但是,就像我说的,每次调用 50 纳秒是期望合理的上限。

Your mileage may vary, to understate the situation.您的里程可能会有所不同,以低估情况。

The performance of dynamic_cast depends a great deal on what you are doing, and can depend on what the names of classes are (and, comparing time relative to reinterpet_cast seems odd, since in most cases that takes zero instructions for practical purposes, as does eg a cast from unsigned to int ). dynamic_cast 的性能在很大程度上取决于您在做什么,并且可能取决于类的名称是什么(并且,将时间与reinterpet_cast进行比较似乎很奇怪,因为在大多数情况下,出于实际目的需要零指令,例如从unsignedint )。

I've been looking into how it works in clang/g++.我一直在研究它在 clang/g++ 中是如何工作的。 Assuming that you are dynamic_cast ing from a B* to a D* , where B is a (direct or indirect) base of D , and disregarding multiple-base-class complications, It seems to work by calling a library function which does something like this:假设你是dynamic_cast从荷兰国际集团B*D* ,其中B是的(直接或间接)的基础D ,并不顾多基类的并发症,它通过调用库函数,它确实是这样,似乎工作这个:

for dynamic_cast<D*>(  p  )   where p is B*

type_info const * curr_typ = &typeid( *p );
while(1) {
     if( *curr_typ == typeid(D)) { return static_cast<D*>(p); } // success;
     if( *curr_typ == typeid(B)) return nullptr;   //failed
     curr_typ = get_direct_base_type_of(*curr_typ); // magic internal operation
}

So, yes, it's pretty fast when *p is actually a D ;所以,是的,当*p实际上是一个D时它非常快; just one successful type_info compare.只有一个成功的type_info比较。 The worst case is when the cast fails, and there are a lot of steps from D to B ;最坏的情况是当演员表失败时,从DB有很多步骤; in this case there are a lot of failed type comparisons.在这种情况下,有很多失败的类型比较。

How long does type comparison take?类型比较需要多长时间? it does this, on clang/g++:它在 clang/g++ 上这样做:

compare_eq( type_info const &a, type_info const & b ){
   if( &a == &b) return true;   // same object
   return strcmp( a.name(), b.name())==0;
}

The strcmp is needed since it's possible to have two different type_info objects representing the same type (although I'm pretty sure this only happens when one is in a shared library, and the other is not in that library). strcmp 是必需的,因为可能有两个不同的type_info对象表示相同的类型(尽管我很确定这只发生在一个在共享库中而另一个不在该库中时)。 But, in most cases, when types are actually equal, they reference the same type_info;但是,在大多数情况下,当类型实际上相等时,它们引用相同的 type_info; thus most successful type comparisons are very fast.因此,大多数成功的类型比较都非常快。

The name() method just returns a pointer to a fixed string containing the mangled name of the class. name()方法只返回一个指向包含类的重整名称的固定字符串的指针。 So there's another factor: if many of the classes on the way from D to B have names starting with MyAppNameSpace::AbstractSyntaxNode< , then the failing compares are going to take longer than usual;所以还有另一个因素:如果从DB许多类的名称都以MyAppNameSpace::AbstractSyntaxNode<开头,那么失败的比较将花费比平常更长的时间; the strcmp won't fail until it reaches a difference in the mangled type names. strcmp 不会失败,直到它达到损坏的类型名称的差异。

And, of course, since the operation as a whole is traversing a bunch of linked data structures representing the type hierarchy, the time will depend on whether those things are fresh in the cache or not.而且,当然,由于整个操作正在遍历表示类型层次结构的一组链接数据结构,因此时间将取决于这些内容是否在缓存中是新鲜的。 So the same cast done repeatedly is likely to show an average time which doesn't necessarily represent the typical performance for that cast.因此,重复进行的同一个演员很可能会显示平均时间,这不一定代表该演员的典型表现。

The question doesn't mention the alternative.这个问题没有提到替代方案。 Prior to RTTI being widely available, or simply to avoid using RTTI, the traditional method is to use a virtual method to check the type of the class, and then static_cast as appropriate.在 RTTI 被广泛使用之前,或者只是为了避免使用 RTTI,传统的方法是使用虚拟方法来检查类的类型,然后根据需要进行static_cast This has the disadvantage that it doesn't work for multiple inheritance , but has the advantage that it doesn't have to spend time checking a multiple inheritance hierarchy either!这样做的缺点是它不适用于多重继承,但优点是它也不必花时间检查多重继承层次结构!

In my tests:在我的测试中:

  • dynamic_cast runs at about 14.4953 nanoseconds . dynamic_cast运行时间约为14.4953 纳秒
  • Checking a virtual method and static_cast ing runs at about twice the speed, 6.55936 nanoseconds .检查虚拟方法和static_cast运行速度约为6.55936 纳秒的两倍。

This is for testing with a 1:1 ratio of valid:invalid casts, using the following code with optimisations disabled.这是用于以 1:1 的有效:无效转换比率进行测试,使用以下代码禁用优化。 I used Windows for performance checking.我使用 Windows 进行性能检查。

 #include <iostream> #include <windows.h> struct BaseClass { virtual int GetClass() volatile { return 0; } }; struct DerivedClass final : public BaseClass { virtual int GetClass() volatile final override { return 1; } }; volatile DerivedClass *ManualCast(volatile BaseClass *lp) { if (lp->GetClass() == 1) { return static_cast<volatile DerivedClass *>(lp); } return nullptr; } LARGE_INTEGER perfFreq; LARGE_INTEGER startTime; LARGE_INTEGER endTime; void PrintTime() { float seconds = static_cast<float>(endTime.LowPart - startTime.LowPart) / static_cast<float>(perfFreq.LowPart); std::cout << "T=" << seconds << std::endl; } BaseClass *Make() { return new BaseClass(); } BaseClass *Make2() { return new DerivedClass(); } int main() { volatile BaseClass *base = Make(); volatile BaseClass *derived = Make2(); int unused = 0; const int t = 1000000000; QueryPerformanceFrequency(&perfFreq); QueryPerformanceCounter(&startTime); for (int n = 0; n < t; ++n) { volatile DerivedClass *alpha = dynamic_cast<volatile DerivedClass *>(base); volatile DerivedClass *beta = dynamic_cast<volatile DerivedClass *>(derived); unused += alpha ? 1 : 0; unused += beta ? 1 : 0; } QueryPerformanceCounter(&endTime); PrintTime(); QueryPerformanceCounter(&startTime); for (int n = 0; n < t; ++n) { volatile DerivedClass *alpha = ManualCast(base); volatile DerivedClass *beta = ManualCast(derived); unused += alpha ? 1 : 0; unused += beta ? 1 : 0; } QueryPerformanceCounter(&endTime); PrintTime(); std::cout << unused; delete base; delete derived; }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM