C# 中“直接”虚拟调用与接口调用的性能

Question

This benchmark appears to show that calling a virtual method directly on object reference is faster than calling it on the reference to the interface this object implements.该基准测试似乎表明，直接在对象引用上调用虚拟方法比在该对象实现的接口的引用上调用它要快。

In other words:换句话说：

interface IFoo {
    void Bar();
}

class Foo : IFoo {
    public virtual void Bar() {}
}

void Benchmark() {
    Foo f = new Foo();
    IFoo f2 = f;
    f.Bar(); // This is faster.
    f2.Bar();    
}

Coming from the C++ world, I would have expected that both of these calls would be implemented identically (as a simple virtual table lookup) and have the same performance.来自 C++ 世界，我原以为这两个调用会以相同的方式实现（作为简单的虚拟表查找）并具有相同的性能。 How does C# implement virtual calls and what is this "extra" work that apparently gets done when calling through an interface? C# 如何实现虚拟调用以及通过接口调用时显然完成的“额外”工作是什么？

--- EDIT --- - - 编辑 - -

OK, answers/comments I got so far imply that there is a double-pointer-dereference for virtual call through interface versus just one dereference for virtual call through object.好的，到目前为止我得到的答案/评论暗示通过接口的虚拟调用有一个双指针取消引用，而不是通过对象的虚拟调用只有一个取消引用。

So could please somebody explain why is that necessary?所以可以请有人解释为什么有必要吗？ What is the structure of the virtual table in C#? C#中虚拟表的结构是什么？ Is it "flat" (as is typical for C++) or not?它是否“平坦”（C++ 的典型特征）？ What were the design tradeoffs that were made in C# language design that lead to this?导致这种情况的 C# 语言设计中的设计权衡是什么？ I'm not saying this is a "bad" design, I'm simply curious as to why it was necessary.我并不是说这是一个“糟糕”的设计，我只是好奇为什么它是必要的。

In a nutshell, I'd like to understand what my tool does under the hood so I can use it more effectively.简而言之，我想了解我的工具在幕后做了什么，以便我可以更有效地使用它。 And I would appreciate if I didn't get any more "you shouldn't know that" or "use another language" types of answers.如果我没有得到更多“你不应该知道”或“使用另一种语言”类型的答案，我将不胜感激。

--- EDIT 2 --- --- 编辑 2 ---

Just to make it clear we are not dealing with some compiler of JIT optimization here that removes the dynamic dispatch: I modified the benchmark mentioned in the original question to instantiate one class or the other randomly at run-time.只是为了清楚起见，我们在这里没有处理一些消除动态调度的 JIT 优化编译器：我修改了原始问题中提到的基准测试，以在运行时随机实例化一个类或另一个类。 Since the instantiation happens after compilation and after assembly loading/JITing, there is no way to avoid dynamic dispatch in both cases:由于实例化发生在编译之后和程序集加载/JITing 之后，在这两种情况下都无法避免动态调度：

interface IFoo {
    void Bar();
}

class Foo : IFoo {
    public virtual void Bar() {
    }
}

class Foo2 : Foo {
    public override void Bar() {
    }
}

class Program {

    static Foo GetFoo() {
        if ((new Random()).Next(2) % 2 == 0)
            return new Foo();
        return new Foo2();
    }

    static void Main(string[] args) {

        var f = GetFoo();
        IFoo f2 = f;

        Console.WriteLine(f.GetType());

        // JIT warm-up
        f.Bar();
        f2.Bar();

        int N = 10000000;
        Stopwatch sw = new Stopwatch();

        sw.Start();
        for (int i = 0; i < N; i++) {
            f.Bar();
        }
        sw.Stop();
        Console.WriteLine("Direct call: {0:F2}", sw.Elapsed.TotalMilliseconds);

        sw.Reset();
        sw.Start();
        for (int i = 0; i < N; i++) {
            f2.Bar();
        }
        sw.Stop();
        Console.WriteLine("Through interface: {0:F2}", sw.Elapsed.TotalMilliseconds);

        // Results:
        // Direct call: 24.19
        // Through interface: 40.18

    }

}

--- EDIT 3 --- --- 编辑 3 ---

If anyone is interested, here is how my Visual C++ 2010 lays out an instance of a class that multiply-inherits other classes:如果有人感兴趣，这里是我的 Visual C++ 2010 如何布局一个类的实例，该类的多个继承其他类：

Code:代码：

class IA {
public:
    virtual void a() = 0;
};

class IB {
public:
    virtual void b() = 0;
};

class C : public IA, public IB {
public:
    virtual void a() override {
        std::cout << "a" << std::endl;
    }
    virtual void b() override {
        std::cout << "b" << std::endl;
    }
};

Debugger:调试器：

c   {...}   C
    IA  {...}   IA
        __vfptr 0x00157754 const C::`vftable'{for `IA'} *
            [0] 0x00151163 C::a(void)   *
    IB  {...}   IB
        __vfptr 0x00157748 const C::`vftable'{for `IB'} *
            [0] 0x0015121c C::b(void)   *

Multiple virtual table pointers are clearly visible, and sizeof(C) == 8 (in 32-bit build).多个虚拟表指针清晰可见，并且sizeof(C) == 8 （在 32 位构建中）。

The...这...

C c;
std::cout << static_cast<IA*>(&c) << std::endl;
std::cout << static_cast<IB*>(&c) << std::endl;

..prints... ..印刷...

0027F778
0027F77C

...indicating that pointers to different interfaces within the same object actually point to different parts of that object (ie they contain different physical addresses). ...表示指向同一对象内不同接口的指针实际上指向该对象的不同部分（即它们包含不同的物理地址）。

Answer 1

I think the article Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects will answer your questions.我认为Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects 一文将回答您的问题。 In particular, see the section * Interface Vtable Map and Interface Map -, and the following section on Virtual Dispatch.特别是，请参阅 * Interface Vtable Map 和 Interface Map - 部分以及以下有关 Virtual Dispatch 的部分。

It's probably possible for the JIT compiler to figure things out and optimize the code for your simple case. JIT 编译器可能会为您的简单案例解决问题并优化代码。 But not in the general case.但不是在一般情况下。

IFoo f2 = GetAFoo();

And GetAFoo is defined as returning an IFoo , then the JIT compiler wouldn't be able to optimize the call.并且GetAFoo被定义为返回一个IFoo ，那么 JIT 编译器将无法优化调用。

Answer 2

Here is what the disassembly looks like (Hans is correct):这是反汇编的样子（汉斯是正确的）：

            f.Bar(); // This is faster.
00000062  mov         rax,qword ptr [rsp+20h]
00000067  mov         rax,qword ptr [rax]
0000006a  mov         rcx,qword ptr [rsp+20h]
0000006f  call        qword ptr [rax+60h]
            f2.Bar();
00000072  mov         r11,7FF000400A0h
0000007c  mov         qword ptr [rsp+38h],r11
00000081  mov         rax,qword ptr [rsp+28h]
00000086  cmp         byte ptr [rax],0
00000089  mov         rcx,qword ptr [rsp+28h]
0000008e  mov         r11,qword ptr [rsp+38h]
00000093  mov         rax,qword ptr [rsp+38h]
00000098  call        qword ptr [rax]

Answer 3

I tried your test and on my machine, in a particular context, the result is actually the other way around.我试过你的测试，在我的机器上，在特定的上下文中，结果实际上是相反的。

I am running Windows 7 x64 and I have created a Visual Studio 2010 Console Application project into which I have copied your code.我正在运行 Windows 7 x64，并且创建了一个Visual Studio 2010控制台应用程序项目，我已将您的代码复制到该项目中。 If a compile the project in Debug mode and with the platform target as x86 the output will be the following:如果在调试模式下编译项目并且平台目标为x86 ，输出将如下所示：

Direct call: 48.38 Through interface: 42.43直拨：48.38 直通接口：42.43

Actually every time when running the application it will provide slightly different results, but the interface calls will always be faster.实际上每次运行应用程序时，它都会提供略有不同的结果，但接口调用总是更快。 I assume that since the application is compiled as x86, it will be run by the OS through WoW .我假设由于应用程序被编译为 x86，它将由操作系统通过WoW运行。

For a complete reference, below are the results for the rest of compilation configuration and target combinations.作为完整的参考，以下是其余编译配置和目标组合的结果。

Release mode and x86 target发布模式和x86目标
Direct call: 23.02直拨电话：23.02
Through interface: 32.73通过接口：32.73

Debug mode and x64 target调试模式和x64目标
Direct call: 49.49直拨：49.49
Through interface: 56.97通过接口：56.97

Release mode and x64 target发布模式和x64目标
Direct call: 19.60直拨电话：19.60
Through interface: 26.45通过接口：26.45

All of the above tests were made with .NET 4.0 as the target platform for the compiler.以上所有测试都是使用 .NET 4.0 作为编译器的目标平台进行的。 When switching to 3.5 and repeating the above tests, the calls through the interface were always longer than the direct calls.切换到3.5，重复上述测试时，通过接口的调用总是比直接调用的时间长。

So, the above tests rather complicate things since it seems that the behavior you spotted is not always happening.因此，上述测试使事情变得相当复杂，因为您发现的行为似乎并不总是发生。

In the end, with the risk of upsetting you, I would like to add a few thoughts.最后，冒着让你不高兴的风险，我想补充几点。 Many people added comments that the performance differences are quite small and in real world programming you should not care about them and I agree with this point of view.许多人补充说性能差异非常小，在现实世界的编程中你不应该关心它们，我同意这个观点。 There are two main reasons for it.有两个主要原因。

The first and the most advertised one is that .NET was build on a higher level in order to enable developers to focus on the higher levels of applications.第一个也是宣传最多的一个是 .NET 建立在更高级别上，以便使开发人员能够专注于更高级别的应用程序。 A database or an external service call is thousands or sometimes millions of times slower than a virtual method call.数据库或外部服务调用比虚拟方法调用慢数千甚至数百万倍。 Having a good high level architecture and focusing on the big performance consumers will always bring better results in modern applications rather than avoiding double-pointer-dereferences.拥有良好的高层架构并专注于大性能消费者将始终在现代应用程序中带来更好的结果，而不是避免双指针取消引用。

The second and more obscure one is that the .NET team by building the framework on a higher level has actually introduced a series of abstraction levels which the just in time compiler would be able to use for optimizations on different platforms.第二个也是更模糊的是，.NET 团队通过在更高级别上构建框架实际上引入了一系列抽象级别，即时编译器将能够使用这些级别在不同平台上进行优化。 The more access they would give to the under layers, the more developers would be able to optimize for a specific platform, but the less the runtime compiler would be able to do for the others.他们给予底层的访问权限越多，开发人员就越能针对特定平台进行优化，但运行时编译器为其他平台所做的工作就越少。 That is the theory at least and that is why things are not as well documented as in C++ regarding this particular matter.这至少是理论，这就是为什么关于这个特定问题的事情不像在 C++ 中那样被很好地记录下来。

Answer 4

The general rule is: Classes are fast.一般规则是：课程很快。 Interfaces are slow.接口很慢。

That's one of the reasons for the recommendation "Build hierarchies with classes and use interfaces for intra-hierarchy behavior".这就是建议“使用类构建层次结构并使用接口进行层次结构内行为”的原因之一。

For virtual methods, the difference might be slight (like 10%).对于虚拟方法，差异可能很小（比如 10%）。 But for non-virtual methods and fields the difference is huge.但是对于非虚拟方法和字段，差异是巨大的。 Consider this program.考虑这个程序。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace InterfaceFieldConsoleApplication
{
    class Program
    {
        public abstract class A
        {
            public int Counter;
        }

        public interface IA
        {
            int Counter { get; set; }
        }

        public class B : A, IA
        {
            public new int Counter { get { return base.Counter; } set { base.Counter = value; } }
        }

        static void Main(string[] args)
        {
            var b = new B();
            A a = b;
            IA ia = b;
            const long LoopCount = (int) (100*10e6);
            var stopWatch = new Stopwatch();
            stopWatch.Start();
            for (int i = 0; i < LoopCount; i++)
                a.Counter = i;
            stopWatch.Stop();
            Console.WriteLine("a.Counter: {0}", stopWatch.ElapsedMilliseconds);
            stopWatch.Reset();
            stopWatch.Start();
            for (int i = 0; i < LoopCount; i++)
                ia.Counter = i;
            stopWatch.Stop();
            Console.WriteLine("ia.Counter: {0}", stopWatch.ElapsedMilliseconds);
            Console.ReadKey();
        }
    }
}

Output:输出：

a.Counter: 1560
ia.Counter: 4587

Answer 5

I think the pure virtual function case can use a simple virtual function table, as any derived class of Foo implementing Bar would just change the virtual function pointer to Bar .我认为纯虚函数的情况可以使用一个简单的虚函数表，因为任何实现Bar的Foo派生类都会将虚函数指针更改为Bar 。

On the other hand, calling an interface function IFoo:Bar couldn't do a lookup at something like IFoo 's virtual function table, because every implementation of IFoo doesn't need to necceserely implement other functions nor interfaces that Foo does.另一方面，调用接口函数 IFoo:Bar 无法查找IFoo的虚函数表之类的东西，因为IFoo每个实现都不需要实现Foo所做的其他函数或接口。 So the virtual function table entry position for Bar from another class Fubar: IFoo must not match the virtual function table entry position of Bar in class Foo:IFoo .所以对于虚函数表条目位置Bar ，从另一个class Fubar: IFoo必须不匹配的虚函数表条目位置Bar在class Foo:IFoo 。

Thus a pure virtual function call can rely on the same index of the function pointer inside the virtual function table in every derived class, while the interface call has to look up the this index first.因此，纯虚函数调用可以依赖于每个派生类中虚函数表内函数指针的相同索引，而接口调用必须首先查找 this 索引。

C# 中“直接”虚拟调用与接口调用的性能

问题描述

--- EDIT --- - - 编辑 - -

--- EDIT 2 --- --- 编辑 2 ---

--- EDIT 3 --- --- 编辑 3 ---

5 个解决方案

解决方案1
27 已采纳 2011-09-27 17:14:18

解决方案2
21 2011-08-29 01:54:05

解决方案3
12 2011-10-03 22:39:37

解决方案4
4 2013-09-29 14:18:14

解决方案5
1 2011-10-04 10:40:29

C# 中“直接”虚拟调用与接口调用的性能

问题描述

--- EDIT --- - - 编辑 - -

--- EDIT 2 --- --- 编辑 2 ---

--- EDIT 3 --- --- 编辑 3 ---

5 个解决方案

解决方案1 27 已采纳 2011-09-27 17:14:18

解决方案2 21 2011-08-29 01:54:05

解决方案3 12 2011-10-03 22:39:37

解决方案4 4 2013-09-29 14:18:14

解决方案5 1 2011-10-04 10:40:29

解决方案1
27 已采纳 2011-09-27 17:14:18

解决方案2
21 2011-08-29 01:54:05

解决方案3
12 2011-10-03 22:39:37

解决方案4
4 2013-09-29 14:18:14

解决方案5
1 2011-10-04 10:40:29