简体   繁体   English

C# 与 C++ 中的虚拟呼叫速度

[英]Speed of virtual call in C# vs C++

I seem to recall reading somewhere that the cost of a virtual call in C# is not as high, relatively speaking, as in C++.我似乎记得在某处读过,C# 中的虚拟通话成本相对而言不如 C++ 中的高。 Is this true?这是真的? If so - why?如果是这样 - 为什么?

A C# virtual call has to check for “this” being null and a C++ virtual call does not. C# 虚拟调用必须检查“this”是否为 null,而 C++ 虚拟调用则不需要。 So I can't see in generally why a C# virtual calls would be faster.所以我一般看不出为什么 C# 虚拟调用会更快。 In special cases the C# compiler (or JIT compiler) may be able to inline the virtual call better then a C++ compiler, as a C# compiler has access to better type information.在特殊情况下,C# 编译器(或 JIT 编译器)可能能够比 C++ 编译器更好地内联虚拟调用,因为 C# 编译器可以访问更好的类型信息。 The call method instruction may sometimes be slower in C++, as the C# JIT may be able to use a quicker instruction that only copes with a small offset as it know more about the runtime memory layout and processor model then a C++ compiler. The call method instruction may sometimes be slower in C++, as the C# JIT may be able to use a quicker instruction that only copes with a small offset as it know more about the runtime memory layout and processor model then a C++ compiler.

However we are talking about a handful of processor instruction at most here.然而,我们在这里最多谈论一些处理器指令。 On a modem superscalar processor, it is very possible that the “null check” instruct is run at the same time as the “call method” and therefore takes no time.在现代超标量处理器上,“空值检查”指令很可能与“调用方法”同时运行,因此不需要时间。

It is also very likely that all the processor instructions will already in be the level 1 cache if the call is make in a loop.如果调用是在循环中进行的,那么所有处理器指令也很可能已经在 1 级缓存中。 But the data is less likely to be caches, the cost of reading a data value from main memory these days is the same as running 100s of instructions from the level 1 cache.但是数据不太可能是缓存,这些天从主 memory 读取数据值的成本与从 1 级缓存中运行 100 条指令的成本相同。 Therefore it is unlucky that in real applications the cost of a virtual call is even measurable in more then a very few places.因此,不幸的是,在实际应用中,虚拟通话的成本甚至在少数几个地方都可以衡量。

The fact that the C# code uses a few more instructions will of course reduce the amount of code that can fit in the cache, the effect of this is impossible to predict. C# 代码使用更多指令这一事实当然会减少缓存中可容纳的代码量,其影响无法预测。

(If the C++ class uses multiple inherence then the cost is more, due to having to patch up the “this” pointer. Likewise interfaces in C# add another level of redirection.) (如果 C++ class 使用多重继承,则成本更高,因为必须修补“this”指针。同样,C# 中的接口添加了另一级重定向。)

For JIT compiled languages (I don't know if CLR does this or not, Sun's JVM does), it's a common optimisation to convert a virtual call which has only two or three implementations into a sequence of tests on the type and direct or inline calls.对于 JIT 编译语言(我不知道 CLR 是否这样做,Sun 的 JVM 这样做),将只有两个或三个实现的虚拟调用转换为对类型和直接或内联的一系列测试是一种常见的优化来电。

The advantage of this is that modern pipelined CPUs can use branch prediction and prefetching of direct calls, but an indirect call (represented by a function pointer in high level languages) often results in the pipeline stalling.这样做的好处是现代流水线 CPU 可以使用分支预测和直接调用的预取,但是间接调用(在高级语言中由 function 指针表示)通常会导致流水线停顿。

In the limiting case, where there is only one implementation of the virtual call and the body of the call is small enough, the virtual call reduced to purely inline code .在限制情况下,只有一种虚拟调用的实现并且调用的主体足够小,虚拟调用被简化为纯粹的内联代码 This technique was used in the Self language runtime, which the JVM evolved from.该技术用于Self 语言运行时,JVM 是从该运行时演变而来的。

Most C++ compilers don't perform the whole program analysis required to perform this optimisation, but projects such as LLVM are looking at whole program optimisations such as this.大多数 C++ 编译器不执行执行此优化所需的整个程序分析,但 LLVM 等项目正在研究诸如此类的整个程序优化。

The original question says:原来的问题说:

I seem to recall reading somewhere that the cost of a virtual call in C# is not as high, relatively speaking , as in C++.我似乎记得在某处读过,C# 中的虚拟通话成本相对而言不如 C++ 中的高。

Note the emphasis.注意重点。 In other words, the question might be rephrased as:换句话说,这个问题可以改写为:

I seem to recall reading somewhere that in C#, virtual and non-virtual calls are equally slow, whereas in C++ a virtual call is slower than a non-virtual call...我似乎记得在某处读过,在 C# 中,虚拟和非虚拟调用同样慢,而在 C++ 中,虚拟调用比非虚拟调用慢...

So the questioner is not claiming that C# is faster than C++ under any circumstances.所以发问者并没有声称 C# 在任何情况下都比 C++ 快。

Possibly a useless diversion, but this sparked my curiosity concerning C++ with /clr:pure, using no C++/CLI extensions.可能是无用的转移,但这引发了我对带有 /clr:pure 的 C++ 的好奇心,不使用 C++/CLI 扩展。 The compiler produces IL that gets converted to native code by the JIT, although it is pure C++.编译器生成的 IL 被 JIT 转换为本机代码,尽管它是纯 C++。 So here we have a way of seeing what a standard C++ implementation does if running on the same platform as C#.因此,在这里我们可以看到标准 C++ 实现在与 C# 相同的平台上运行时的作用。

With a non-virtual method:使用非虚拟方法:

struct Plain
{
    void Bar() { System::Console::WriteLine("hi"); }
};

This code:这段代码:

Plain *p = new Plain();
p->Bar();

... causes the call opcode to be emitted with the specific method name, passing Bar an implicit this argument. ... 导致使用特定方法名称发出call操作码,向 Bar 传递一个隐式this参数。

call void <Module>::Plain.Bar(valuetype Plain*)

Compare with an inheritance hierarchy:与 inheritance 层次结构进行比较:

struct Base
{
    virtual void Bar() = 0;
};

struct Derived : Base
{
    void Bar() { System::Console::WriteLine("hi"); }
};

Now if we do:现在如果我们这样做:

Base *b = new Derived();
b->Bar();

That emits the calli opcode instead, which jumps to a computed address - so there's a lot of IL before the call.而是发出calli操作码,它跳转到计算的地址 - 所以在调用之前有很多 IL。 By turning it back in to C# we can see what is going on:通过将其转回 C# 我们可以看到发生了什么:

**(*((int*) b))(b);

In other words, cast the address of b to a pointer to int (which happens to be the same size as a pointer) and take the value at that location, which is the address of the vtable, and then take the first item in the vtable, which is the address to jump to, dereference it and call it, passing it the implicit this argument.换句话说,将b的地址转换为指向 int 的指针(恰好与指针大小相同)并取该位置的值,即 vtable 的地址,然后取vtable,这是要跳转到的地址,取消引用并调用它,将隐式this参数传递给它。

We can tweak the virtual example to use C++/CLI extensions:我们可以调整虚拟示例以使用 C++/CLI 扩展:

ref struct Base
{
    virtual void Bar() = 0;
};

ref struct Derived : Base
{
    virtual void Bar() override { System::Console::WriteLine("hi"); }
};

Base ^b = gcnew Derived();
b->Bar();

This generates the callvirt opcode, exactly as it would in C#:这将生成callvirt操作码,与 C# 中的操作码完全相同:

callvirt instance void Base::Bar()

So when compiling to target the CLR, Microsoft's current C++ compiler doesn't have the same possibilities for optimization as C# does when using the standard features of each language;因此,在针对 CLR 进行编译时,微软当前的 C++ 编译器在使用每种语言的标准特性时没有像 C# 那样的优化可能性; for a standard C++ class hierarchy, the C++ compiler generates code that contains hard-coded logic for traversing the vtable, whereas for a ref class it leaves it to the JIT to figure out the optimal implementation. for a standard C++ class hierarchy, the C++ compiler generates code that contains hard-coded logic for traversing the vtable, whereas for a ref class it leaves it to the JIT to figure out the optimal implementation.

I guess this assumption is based on JIT-compiler, meaning that C# probably converts a virtual call into a simple method call a bit before it is actually used.我猜这个假设是基于 JIT 编译器,这意味着 C# 可能在实际使用之前将虚拟调用转换为简单的方法调用。

But it's essentially theoretical and i would not bet on it !但这本质上是理论上的,我不会打赌!

The cost of a virtual call in C++ is that of a function call through a pointer (vtbl). C++ 中虚拟调用的成本是通过指针 (vtbl) 调用 function 的成本。 I doubt that C# can do that one faster and still being able to determine object type at runtime...我怀疑 C# 可以更快地做到这一点,并且仍然能够在运行时确定 object 类型......

Edit: As Pete Kirkham pointed out, a good JIT might be able to inline the C# call, avoiding a pipeline stall;编辑:正如 Pete Kirkham 所指出的,一个好的 JIT 可能能够内联 C# 调用,避免管道停顿; something most C++ compilers cannot do (yet).大多数 C++ 编译器(还)不能做的事情。 On the other hand, Ian Ringrose mentioned the impact on cache usage.另一方面,Ian Ringrose 提到了对缓存使用的影响。 Adding to that the JIT itself running, and (strictly personally) I wouldn't bother really unless profiling on the target machine under realistic workloads has proven the one to be faster than the other.再加上 JIT 本身正在运行,并且(严格个人而言)我不会真正打扰,除非在实际工作负载下在目标机器上进行分析证明一个比另一个更快。 It's micro-optimization at best.充其量只是微优化。

Not sure about the full framework but in the Compact Framework it will be slower cause CF has no virtual call tables although it does cache the result.不确定完整的框架,但在 Compact Framework 中它会更慢,因为 CF 没有虚拟调用表,尽管它确实缓存了结果。 This means that a virtual call in CF will be slower the first time it is called as it has to do a manual lookup.这意味着 CF 中的虚拟调用在第一次调用时会变慢,因为它必须进行手动查找。 It may be slow every time it is called if the app is low on memory as the cached lookup may be pitched.如果应用程序在 memory 上的速度较低,则每次调用它时可能会很慢,因为缓存的查找可能会被投放。

In C# it might be possible to convert a virtual function to non-virtual by analysing the code.在 C# 中,可以通过分析代码将虚拟 function 转换为非虚拟。 In practice it won't happen often enough to make much difference.在实践中,它不会经常发生,足以产生很大的不同。

C# flattens the vtable and inlines ancestor calls so you don't chain up the inheritance hierarchy to resolve anything. C# 扁平化 vtable 并内联祖先调用,因此您不会链接 inheritance 层次结构来解决任何问题。

It may be not exactly the answer to your question, but although .NET JIT optimizes the virtual calls as everyone said before, profile-guided optimization in Visual Studio 2005 and 2008 does virtual call speculation by inserting a direct call to the most likely targeted function, inlining the call, so the weight may be the same.这可能不完全是您问题的答案,但尽管 .NET JIT 正如大家之前所说的那样优化了虚拟呼叫,但 Visual Studio 2005 和 2008 中的配置文件引导优化通过插入对最有可能目标 function 的直接呼叫来进行虚拟呼叫推测,内联调用,因此权重可能相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM