简体   繁体   English

C#编译器或JIT可以优化lambda表达式中的方法调用吗?

[英]Can the C# compiler or JIT optimize away a method call in a lambda expression?

I'm starting this question after a discussion which started ( in comments ) on another StackOverflow question, and I'm intrigued to know the answer. 我在另一个StackOverflow问题开始( 在评论中 )的讨论后开始这个问题,我很想知道答案。 Considering the following expression: 考虑以下表达式:

var objects = RequestObjects.Where(r => r.RequestDate > ListOfDates.Max());

Will there be any (performance) advantage of moving the evaluation of ListOfDates.Max() out of the Where clause in this case, or will 1. the compiler or 2. JIT optimize this away? 在这种情况下,将ListOfDates.Max()的评估移出Where子句会有任何(性能)​​优势,还是1.编译器或2. JIT优化它?

I believe C# will only do constant folding at compile time, and it could be argued that ListOfDates.Max() can not be known at compile time unless ListOfDates itself is somehow constant. 我相信C#只会在编译时进行常量折叠,并且可以认为ListOfDates.Max()在编译时是不可知的,除非ListOfDates本身在某种程度上是常量。

Perhaps there is another compiler (or JIT) optimization that makes sure that this is only evaluated once? 也许还有另一个编译器(或JIT)优化,确保只评估一次?

Well, it's a bit of a complex answer. 嗯,这是一个复杂的答案。

There are two things involved here. 这里涉及两件事。 (1) the compiler and (2) the JIT. (1)编译器和(2)JIT。

The compiler 编译器

Simply put, the compiler just translates your C# code to IL code. 简而言之,编译器只是将您的C#代码转换为IL代码。 It's a pretty trivial translation for most cases and one of the core ideas of .NET is that each function is compiled as an autonomous block of IL code. 对于大多数情况来说,这是一个非常简单的翻译,.NET的核心思想之一是每个函数都被编译为IL代码的自治块。

So, don't expect too much from the C# -> IL compiler. 所以,不要期望C# - > IL编译器过多。

The JIT JIT

That's... a bit more complicated. 那......有点复杂。

The JIT compiler basically translates your IL code to assembler. JIT编译器基本上将您的IL代码转换为汇编程序。 The JIT compiler also contains an SSA based optimizer. JIT编译器还包含基于SSA的优化器。 However, there's a time limit, because we don't want to wait too long before our code starts to run. 但是,有一个时间限制,因为我们不希望在代码开始运行之前等待太久。 Basically this means that the JIT compiler doesn't do all the super cool stuff that will make your code go extremely fast, simply because that would cost too much time. 基本上这意味着JIT编译器不会做所有超级酷的东西,这将使你的代码变得非常快,仅仅因为这会花费太多时间。

We can of course just put it to the test :) Ensure VS will optimize when you run (options -> debugger -> uncheck suppress [...] and just my code), compile in x64 release mode, put a breakpoint and see what happens when you switch to assembler view. 我们当然可以对它进行测试:)确保VS在运行时进行优化(选项 - >调试器 - >取消选中抑制[...]和我的代码),在x64发布模式下编译,设置断点并查看切换到汇编程序视图时会发生什么。

But hey, what's the fun in only having theory; 但是,嘿,只有理论才有趣; let's put it to the test. 让我们来测试吧。 :) :)

static bool Foo(Func<int, int, int> foo, int a, int b)
{
    return foo(a, b) > 0;  // put breakpoint on this line.
}

public static void Test()
{
    int n = 2;
    int m = 2;
    if (Foo((a, b) => a + b, n, m)) 
    {
        Console.WriteLine("yeah");
    }
}

First thing you should notice is that the breakpoint is hit. 你应该注意的第一件事是断点被击中。 This already tells that the method ain't inlined; 这已经告诉该方法没有内联; if it were, you wouldn't hit the breakpoint at all. 如果是的话,你根本就不会遇到断点。

Next, if you watch the assembler output, you'll notice a 'call' instructions using an address. 接下来,如果您观察汇编程序输出,您会注意到使用地址的“调用”指令。 Here's your function. 这是你的功能。 On closer inspection, you'll notice that it's calling the delegate. 仔细观察,你会注意到它正在呼叫代表。

Now, basically this means that the call is not inlined, and therefore is not optimized to match the local (method) context. 现在,基本上这意味着调用没有内联,因此没有进行优化以匹配本地(方法)上下文。 In other words, not using delegates and putting stuff in your method is probably faster than using delegates. 换句话说,不使用委托并在您的方法中放置东西可能比使用委托更快。

On the other hand, the call is pretty efficient. 在另一方面,呼叫非常有效的。 Basically the function pointer is simply passed and called. 基本上,函数指针只是传递和调用。 There's no vtable lookup, just a simple call. 没有vtable查找,只是一个简单的调用。 This means it probably beats calling a member (eg IL callvirt ). 这意味着它可能胜过呼叫成员(例如IL callvirt )。 Still, static calls (IL call ) should be even faster, since these are predictable compile-time. 静态调用(IL call )应该更快,因为这些是可预测的编译时间。 Again, let's test, shall we? 我们再来试试吧?

public static void Test()
{
    ISummer summer = new Summer();
    Stopwatch sw = Stopwatch.StartNew();
    int n = 0;
    for (int i = 0; i < 1000000000; ++i)
    {
        n = summer.Sum(n, i);
    }
    Console.WriteLine("Vtable call took {0} ms, result = {1}", sw.ElapsedMilliseconds, n);

    Summer summer2 = new Summer();
    sw = Stopwatch.StartNew();
    n = 0;
    for (int i = 0; i < 1000000000; ++i)
    {
        n = summer.Sum(n, i);
    }
    Console.WriteLine("Non-vtable call took {0} ms, result = {1}", sw.ElapsedMilliseconds, n);

    Func<int, int, int> sumdel = (a, b) => a + b;
    sw = Stopwatch.StartNew();
    n = 0;
    for (int i = 0; i < 1000000000; ++i)
    {
        n = sumdel(n, i);
    }
    Console.WriteLine("Delegate call took {0} ms, result = {1}", sw.ElapsedMilliseconds, n);

    sw = Stopwatch.StartNew();
    n = 0;
    for (int i = 0; i < 1000000000; ++i)
    {
        n = Sum(n, i);
    }
    Console.WriteLine("Static call took {0} ms, result = {1}", sw.ElapsedMilliseconds, n);
}

Results: 结果:

Vtable call took 2714 ms, result = -1243309312
Non-vtable call took 2558 ms, result = -1243309312
Delegate call took 1904 ms, result = -1243309312
Static call took 324 ms, result = -1243309312

The thing here that's interesting is actually the latest test result. 这里有趣的事实上是最新的测试结果。 Remember that static calls (IL call ) are completely deterministic. 请记住,静态调用(IL call )是完全确定的。 That means it's a relatively simple thing to optimize for the compiler. 这意味着优化编译器是一件相对简单的事情。 If you inspect the assembler output, you'll find that the call to Sum is actually inlined. 如果检查汇编器输出,您会发现对Sum的调用实际上是内联的。 This makes sense. 这是有道理的。 Actually, if you would test it, just putting the code in the method is just as fast as the static call. 实际上,如果你要测试它,只需将代码放在方法中就像静态调用一样快。

A small remark about Equals 关于Equals的一个小评论

If you measure performance of hash tables, something seems fishy with my explanation. 如果你测量哈希表的性能,我的解释似乎有点可疑。 It appears as-if IEquatable<T> makes things go faster. 看起来如果IEquatable<T>让事情变得更快。

Well, that's actually true. 嗯,这确实是真的。 :-) Hash containers use IEquatable<T> to call Equals . :-)哈希容器使用IEquatable<T>来调用Equals Now, as we all know, objects all implement Equals(object o) . 现在,众所周知,对象都实现了Equals(object o) So, the containers can either call Equals(object) or Equals(T) . 因此,容器可以调用Equals(object)Equals(T) The performance of the call itself is the same. 呼叫本身的性能是一样的。

However, if you also implement IEquatable<T> , the implementation usually looks like this: 但是,如果您还实现IEquatable<T> ,则实现通常如下所示:

bool Equals(object o)
{
    var obj = o as MyType;
    return obj != null && this.Equals(obj);
}

Furthermore, if MyType is a struct, the runtime also needs to apply boxing and unboxing. 此外,如果MyType是结构,则运行时还需要应用装箱和拆箱。 If it would just call IEquatable<T> , none of these steps would be necessary. 如果它只是调用IEquatable<T> ,则不需要这些步骤。 So, even though it appears slower, this has nothing to do with the call itself. 因此,即使看起来较慢,这与呼叫本身无关。

Your questions 你的问题

Will there be any (performance) advantage of moving the evaluation of ListOfDates.Max() out of the Where clause in this case, or will 1. the compiler or 2. JIT optimize this away? 在这种情况下,将ListOfDates.Max()的评估移出Where子句会有任何(性能)​​优势,还是1.编译器或2. JIT优化它?

Yes, there will be an advantage. 是的,会有一个优势。 The compiler / JIT won't optimize it away. 编译器/ JIT不会优化它。

I believe C# will only do constant folding at compile time, and it could be argued that ListOfDates.Max() can not be known at compile time unless ListOfDates itself is somehow constant. 我相信C#只会在编译时进行常量折叠,并且可以认为ListOfDates.Max()在编译时是不可知的,除非ListOfDates本身在某种程度上是常量。

Actually, if you change the static call to n = 2 + Sum(n, 2) you'll notice that the assembler output will contain a 4 . 实际上,如果将静态调用更改为n = 2 + Sum(n, 2)您会注意到汇编程序输出将包含4 Which proves that the JIT optimizer does do constant folding. 这证明了JIT优化器确实可以进行常量折叠。 (Which is quite obvious actually if you know about how SSA optimizers work... const folding and simplification are called a few times). (实际上,如果您了解SSA优化器的工作方式,那么很明显...... const折叠和简化被称为几次)。

The function pointer itself isn't optimized. 函数指针本身未优化。 It might be in the future though. 但它可能在未来。

Perhaps there is another compiler (or JIT) optimization that makes sure that this is only evaluated once? 也许还有另一个编译器(或JIT)优化,确保只评估一次?

As for 'another compiler', if you're willing to add 'another language', you can use C++. 至于“另一个编译器”,如果你愿意添加“另一种语言”,你可以使用C ++。 In C++ these kinds of calls are sometimes optimized away. 在C ++中,这些类型的调用有时会被优化掉。

More interestingly, Clang is based on LLVM, and there are a few C# compilers for LLVM as well. 更有趣的是,Clang基于LLVM,并且还有一些用于LLVM的C#编译器。 I believe Mono has an option to optimize to LLVM, and CoreCLR was working on LLILC. 我相信Mono可以选择优化LLVM,CoreCLR正在研究LLILC。 While I haven't tested this, LLVM can definitely do these kinds of optimizations. 虽然我没有对此进行测试,但LLVM绝对可以进行这些优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 C#编译器能否优化此转换? - Can the C# compiler optimize this cast away? 有没有办法让.Net JIT或C#编译器优化掉空的for循环? - Is there a way to get the .Net JIT or C# compiler to optimize away empty for-loops? Noop方法避免c#编译器优化掉一个临时变量 - Noop method to avoid c# compiler optimize away a temp variable C#编译器会优化变量吗? - Will the C# compiler optimize variable away? C#JIT编译器是否优化空检查? - Does C# JIT compiler optimize null-check? C# 编译器或 JIT 在什么级别优化应用程序代码? - At what level C# compiler or JIT optimize the application code? C#编译器是否将lambda表达式视为公共或私有方法? - Does the C# compiler treat a lambda expression as a public or private method? 如果没有副作用,编译器/JIT 是否可以优化短路评估? - Can the compiler/JIT optimize away short-circuit evaluation if there are no side-effects? VS2010 C ++ / C#编译器能否优化掉循环内部声明的变量? - Can a VS2010 C++/C# compiler optimize away variables declared inside of the loop? 如果本地(但未使用)变量是对对象的唯一强引用,那么符合标准的 C# 编译器是否可以优化掉它? - Can a conforming C# compiler optimize away a local (but unused) variable if it is the only strong reference to an object?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM