简体   繁体   English

C#编译器优化

[英]C# Compiler Optimizations

I'm wondering if someone can explain to me what exactly the compiler might be doing for me to observe such extreme differences in performance for a simple method. 我想知道是否有人可以向我解释一下编译器可能正在为我做些什么来观察一个简单方法的性能差异。

 public static uint CalculateCheckSum(string str) { 
    char[] charArray = str.ToCharArray();
    uint checkSum = 0;
    foreach (char c in charArray) {
        checkSum += c;
    }
    return checkSum % 256;
 }

I'm working with a colleague doing some benchmarking/optimizations for a message processing application. 我正在和一位同事一起为消息处理应用程序做一些基准测试/优化。 Doing 10 million iterations of this function using the same input string took about 25 seconds in Visual Studio 2012, however when the project was built using the "Optimize Code" option turned on the same code executed in 7 seconds for the same 10 million iterations. 在Visual Studio 2012中使用相同的输入字符串执行此函数的1000万次迭代大约需要25秒,但是当使用“优化代码”选项构建项目时,打开相同的代码,在7秒内执行相同的1000万次迭代。

I'm very interested to understand what the compiler is doing behind the scenes for us to be able to see a greater than 3x performance increase for a seemingly innocent block of code such as this. 我非常有兴趣了解编译器在幕后做了什么,以便能够看到像这样看似无辜的代码块的性能提升超过3倍。

As requested, here is a complete Console application that demonstrates what I am seeing. 根据要求,这是一个完整的控制台应用程序,演示我所看到的。

class Program
{
    public static uint CalculateCheckSum(string str)
    {
        char[] charArray = str.ToCharArray();
        uint checkSum = 0;
        foreach (char c in charArray)
        {
            checkSum += c;
        }
        return checkSum % 256;
    }

    static void Main(string[] args)
    {
        string stringToCount = "8=FIX.4.29=15135=D49=SFS56=TOMW34=11752=20101201-03:03:03.2321=DEMO=DG00121=155=IBM54=138=10040=160=20101201-03:03:03.23244=10.059=0100=ARCA10=246";
        Stopwatch stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < 10000000; i++)
        {
            CalculateCheckSum(stringToCount);
        }
        stopwatch.Stop();
        Console.WriteLine(stopwatch.Elapsed);
    }
}

Running in debug with Optimization off I see 13 seconds, on I get 2 seconds. 在调试中运行优化关闭我看到13秒,我得到2秒。

Running in Release with Optimization off 3.1 seconds and on 2.3 seconds. 在发布中运行,优化时间为3.1秒和2.3秒。

To look at what the C# compiler does for you, you need to look at the IL. 要查看C#编译器为您执行的操作,您需要查看IL。 If you want to see how that affects the JITted code, you'll need to look at the native code as described by Scott Chamberlain. 如果你想看看它如何影响JITted代码,你需要查看Scott Chamberlain所描述的本机代码。 Be aware that the JITted code will vary based on processor architecture, CLR version, how the process was launched, and possibly other things. 请注意,JITted代码将根据处理器体系结构,CLR版本,进程的启动方式以及可能的其他内容而有所不同。

I would usually start with the IL, and then potentially look at the JITted code. 我通常会从IL开始,然后可能会查看JITted代码。

Comparing the IL using ildasm can be slightly tricky, as it includes a label for each instruction. 使用ildasm比较IL可能有点棘手,因为它包含每条指令的标签。 Here are two versions of your method compiled with and without optimization (using the C# 5 compiler), with extraneous labels (and nop instructions) removed to make them as easy to compare as possible: 以下是使用和不使用优化(使用C#5编译器)编译的方法的两个版本,删除了无关标签(和nop指令)以使它们尽可能易于比较:

Optimized 优化

  .method public hidebysig static uint32 
          CalculateCheckSum(string str) cil managed
  {
    // Code size       46 (0x2e)
    .maxstack  2
    .locals init (char[] V_0,
             uint32 V_1,
             char V_2,
             char[] V_3,
             int32 V_4)
    ldarg.0
    callvirt   instance char[] [mscorlib]System.String::ToCharArray()
    stloc.0
    ldc.i4.0
    stloc.1
    ldloc.0
    stloc.3
    ldc.i4.0
    stloc.s    V_4
    br.s       loopcheck
  loopstart:
    ldloc.3
    ldloc.s    V_4
    ldelem.u2
    stloc.2
    ldloc.1
    ldloc.2
    add
    stloc.1
    ldloc.s    V_4
    ldc.i4.1
    add
    stloc.s    V_4
  loopcheck:
    ldloc.s    V_4
    ldloc.3
    ldlen
    conv.i4
    blt.s      loopstart
    ldloc.1
    ldc.i4     0x100
    rem.un
    ret
  } // end of method Program::CalculateCheckSum

Unoptimized 未优化

  .method public hidebysig static uint32 
          CalculateCheckSum(string str) cil managed
  {
    // Code size       63 (0x3f)
    .maxstack  2
    .locals init (char[] V_0,
             uint32 V_1,
             char V_2,
             uint32 V_3,
             char[] V_4,
             int32 V_5,
             bool V_6)
    ldarg.0
    callvirt   instance char[] [mscorlib]System.String::ToCharArray()
    stloc.0
    ldc.i4.0
    stloc.1
    ldloc.0
    stloc.s    V_4
    ldc.i4.0
    stloc.s    V_5
    br.s       loopcheck

  loopstart:
    ldloc.s    V_4
    ldloc.s    V_5
    ldelem.u2
    stloc.2
    ldloc.1
    ldloc.2
    add
    stloc.1
    ldloc.s    V_5
    ldc.i4.1
    add
    stloc.s    V_5
  loopcheck:
    ldloc.s    V_5
    ldloc.s    V_4
    ldlen
    conv.i4
    clt
    stloc.s    V_6
    ldloc.s    V_6
    brtrue.s   loopstart

    ldloc.1
    ldc.i4     0x100
    rem.un
    stloc.3
    br.s       methodend

  methodend:
    ldloc.3
    ret
  }

Points to note: 注意事项:

  • The optimized version uses fewer locals. 优化版本使用较少的本地人。 This may allow the JIT to use registers more effectively. 这可以允许JIT更有效地使用寄存器。
  • The optimized version uses blt.s rather than clt followed by brtrue.s when checking whether or not to go round the loop again (this is the reason for one of the extra locals). 优化版本使用blt.s而不是clt后跟brtrue.s当检查是否再次绕过循环时(这是其中一个额外本地人的原因)。
  • The unoptimized version uses an additional local to store the return value before returning, presumably to make debugging easier. 未优化的版本在返回之前使用额外的本地存储返回值,可能是为了使调试更容易。
  • The unoptimized version has an unconditional branch just before it returns. 未优化的版本在返回之前有一个无条件分支。
  • The optimized version is shorter, but I doubt that it's short enough to be inlined, so I suspect that's irrelevant. 优化版本更短,但我怀疑它的内容足够短,所以我怀疑这是无关紧要的。

To get a good understanding, you should look at the IL code generated. 为了更好地理解,您应该查看生成的IL代码。

Compile the assembly, then make a copy of it and compile again with the optimizations. 编译程序集,然后复制它并使用优化再次编译。 Then open both assemblies in .net reflector and compare the difference of the compiled IL. 然后打开.net反射器中的两个组件并比较编译的IL的差异。

Update: Dotnet Reflector is available at http://www.red-gate.com/products/dotnet-development/reflector/ 更新:Dotnet Reflector可在http://www.red-gate.com/products/dotnet-development/reflector/上找到

Update 2: IlSpy seems like a good open source alternative. 更新2:IlSpy似乎是一个很好的开源替代品。 http://ilspy.net/ http://ilspy.net/

Open Source Alternatives to Reflector? 反射器的开源替代品?

I don't know what optimizations it is doing but I can show you how you can find out for your self. 我不知道它正在做什么优化,但我可以告诉你如何找到自己。

First build your code optimized and start it without the debugger attached (the JIT compiler will generate different code if the debugger is attached). 首先构建优化的代码并在没有附加调试器的情况下启动它(如果连接了调试器,JIT编译器将生成不同的代码)。 Run your code so that you know that section was entered at least once so the JIT Compiler had a chance to process it and in Visual Studio go to Debug->Attach To Process... . 运行您的代码,以便您知道该部分至少输入一次,以便JIT编译器有机会处理它,并在Visual Studio中转到Debug->Attach To Process... From the new menu choose your running application. 从新菜单中选择正在运行的应用程序。

Put a breakpoint in the spot you are wondering about and let the program stop, once stopped go to Debug->Windows->Dissasembly . 在您想知道的位置放置一个断点并让程序停止,一旦停止,请转到Debug->Windows->Dissasembly That will show you the compiled code the JIT created and you will be able to inspect what it is doing. 这将向您展示JIT创建的已编译代码,您将能够检查它正在做什么。

(Click for larger view) (点击查看大图) 在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM