简体   繁体   English

LINQ查询很慢

[英]LINQ query is slow

During application profiling I found that function checking for pattern match is very slow. 在应用程序分析期间,我发现模式匹配的函数检查非常慢。 It is written using LINQ. 它是使用LINQ编写的。 Simple replacement of this LINQ expression with loop makes really huge difference. 用循环简单替换这个LINQ表达式会产生巨大的差异。 What is it? 它是什么? Is LINQ really such a bad thing and works so slow or I misunderstand something? LINQ真的是一件坏事而且工作得太慢或我误解了什么?

private static bool PatternMatch1(byte[] buffer, int position, string pattern)
{
    int i = 0;

    foreach (char c in pattern)
    {
        if (buffer[position + i++] != c)
        {
            return false;
        }
    }

    return true;
}    

version 2 with LINQ (suggested by Resharper) 带有LINQ的版本2(由Resharper建议)

private static bool PatternMatch2(byte[] buffer, int position, string pattern)
{
    int i = 0;
    return pattern.All(c => buffer[position + i++] == c);
}

version 3 with LINQ LINQ版本3

private static bool PatternMatch3(byte[] buffer, int position, string pattern)
{
    return !pattern.Where((t, i) => buffer[position + i] != t).Any();
}

version 4 using lambda 版本4使用lambda

private static bool PatternMatch4(byte[] buffer, int position, string pattern, Func<char, byte, bool> predicate)
{
    int i = 0;

    foreach (char c in pattern)
    {
        if (predicate(c, buffer[position + i++]))
        {
            return false;
        }
    }

    return true;
}

and here is the usage with large buffer 这是大缓冲区的用法

const int SIZE = 1024 * 1024 * 50;

byte[] buffer = new byte[SIZE];

for (int i = 0; i < SIZE - 3; ++i)
{
    if (PatternMatch1(buffer, i, "xxx"))
    {
        Console.WriteLine(i);
    }
}

Calling PatternMatch2 or PatternMatch3 is phenomenally slow. 调用PatternMatch2PatternMatch3非常慢。 It takes about 8 seconds for PatternMatch3 and about 4 seconds for PatternMatch2 , while calling PatternMatch1 takes about 0.6. 这需要约8秒钟PatternMatch3和大约4秒钟PatternMatch2 ,而主叫PatternMatch1约需0.6。 As far as I understand it is the same code and I see no difference. 据我所知,它是相同的代码,我认为没有区别。 Any ideas? 有任何想法吗?

Mark Byers and Marco Mp are right about the extra calls overhead. Mark Byers和Marco Mp对于额外的呼叫开销是正确的。 However, there is another reason here: many object allocations, due to a closure. 但是,这里还有另一个原因:由于关闭,许多对象分配。

The compiler creates a new object on each iteration, storing the current values of buffer , position and i that the predicate can use. 编译器在每次迭代时创建一个新对象,存储谓词可以使用的bufferpositioni的当前值。 Here is a dotTrace snapshot of PatternMatch2 showing this. 这是PatternMatch2的dotTrace快照,显示了这一点。 <>c_DisplayClass1 is the compiler-generated class. <>c_DisplayClass1是编译器生成的类。

(Note that the numbers are huge because I'm using tracing profiling, which adds overhead. However, it's the same overhead per call, so the overall percentages are correct). (请注意,这些数字很大,因为我正在使用跟踪性能分析,这会增加开销。但是,每次调用的开销相同,因此总体百分比是正确的)。

在此输入图像描述

The difference is that the LINQ version has extra function calls. 区别在于LINQ版本具有额外的函数调用。 Internally in LINQ your lambda function is called in a loop. 在LINQ内部,你的lambda函数在循环中被调用。

While the extra call might be optimized away by the JIT compiler , it's not guaranteed and it can add a small overhead. 虽然额外的调用可能会JIT编译器优化掉,但它无法保证,并且可以增加很小的开销。 In most cases the extra overhead won't matter, but because your function is very simple and is called an extremely large number of times, even a small overhead can quickly add up. 在大多数情况下,额外的开销并不重要,但由于您的功能非常简单并且被称为极其多次,即使很小的开销也可以快速累加。 In my experience LINQ code generally is a little bit slower than equivalent for loops. 根据我的经验LINQ的代码通常比相当于慢一点点for循环。 That's the performance price you often pay for a more high level syntax. 这是您经常为更高级语法支付的性能价格。

In this situation, I would recommend sticking with the explicit loop. 在这种情况下,我建议坚持使用显式循环。

While you're optimizing this section of code, you might also want to consider a more efficient searching algorithm. 在优化此部分代码时,您可能还需要考虑更有效的搜索算法。 Your algorithm is worst case O(n*m) but better algorithms exist . 您的算法最坏情况为O(n * m),但存在更好的算法

Well let's take the Where operator. 好吧,让我们采取Where运营商。

It's implementation is almost(*) like: 它的实现几乎是(*)像:

public IEnumerable<T> Where(this IEnumerable<T> input, Func<T, bool> fn)
{
    foreach(T i in input) 
        if (fn(i))
            yield return i;
}

This means, for each loop over the IEnumerable an iterator object is created - note that you have SIZE - n of these allocations because you do those many LINQ queries. 这意味着,对于IEnumerable上的每个循环,都会创建一个迭代器对象 - 请注意,您有这些分配的SIZE-n,因为您执行了那么多LINQ查询。

Then for each character in the pattern you have: 然后对于模式中的每个字符:

  1. A function call to the enumerator 对枚举器的函数调用
  2. A function call to the predicate 对谓词的函数调用

The second is a call to a delegate which costs roughly double the calling cost of a typical virtual call (where in the first version you have no extra calls apart the array deindexing. 第二个是对代表的调用,其成本大约是典型虚拟调用的调用成本的两倍(在第一个版本中,除了数组deindexing之外没有额外的调用。

If you really want brute performance, you probably want to get as "older-style" code as possible. 如果你真的想要粗暴的表现,你可能想要获得尽可能“老式​​”的代码。 In this case I would even replace that foreach in method 1 with a plain for (at the very least if it does not serve to optimize, it makes it more readable since you are keeping track of the index anyway). 在这种情况下,我甚至会用方法1替换方法1中的foreach(至少如果它不用于优化,它会使它更具可读性,因为无论如何你都要跟踪索引)。

It's also more readable in this case, and it shows that Resharper suggestions are sometimes debatable. 在这种情况下它也更具可读性,并且它表明Resharper建议有时是值得商榷的。

(*) I said almost because it uses a proxy method to check that the input enumerable is not null and throw the exception earlier than the time the collection is enumerated -- it's a small detail which does not invalidate what I wrote before, highlighting here for completeness. (*)我说几乎是因为它使用代理方法来检查输入枚举是否为null并且在枚举集合之前抛出异常 - 这是一个小细节,它不会使我之前写的内容无效,在此处突出显示为了完整。

LINQ's main goal when applied to collections is its simplicity. LINQ应用于集合时的主要目标是它的简单性。 If you want performance, you should avoid LINQ entirely. 如果你想要性能,你应该完全避免使用LINQ。 Also to enumerate an array you just increment an index variable, while LINQ needs to set up an entire enumerator object. 同样要枚举一个数组,只需增加一个索引变量,而LINQ需要设置一个完整的枚举器对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM