简体   繁体   English

LINQ Lambda 与查询语法性能

[英]LINQ Lambda vs Query Syntax Performance

I saw a LINQ query syntax in my project today which was counting items with a specific condition from a List like this:我今天在我的项目中看到了一个 LINQ 查询语法,它从List计算具有特定条件的项目,如下所示:

int temp = (from A in pTasks 
            where A.StatusID == (int)BusinessRule.TaskStatus.Pending     
            select A).ToList().Count();

I thought of refactoring it by rewriting it using Count(Func) to make it more readable.我想通过使用Count(Func)重写它来重构它,使其更具可读性。 I thought it would also be good performance-wise, so I wrote:我认为这在性能方面也会很好,所以我写道:

int UnassignedCount = pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);

But when I check using StopWatch , the time elapsed by the lambda expression is always more than the query syntax:但是当我使用StopWatch检查时,lambda 表达式经过的时间总是比查询语法多:

Stopwatch s = new Stopwatch();
s.Start();
int UnassignedCount = pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);
s.Stop();
Stopwatch s2 = new Stopwatch();
s2.Start();
int temp = (from A in pTasks 
            where A.StatusID == (int)BusinessRule.TaskStatus.Pending
            select A).ToList().Count();
s2.Stop();

Can somebody explain why it is so?有人可以解释为什么会这样吗?

I have simulated your situation.我已经模拟了你的情况。 And yes, there is difference between execution times of these queries.是的,这些查询的执行时间存在差异。 But, the reason of this difference isn't syntax of the query.但是,这种差异的原因不是查询的语法。 It doesn't matter if you have used method or query syntax.您是否使用过方法或查询语法都没有关系。 Both yields the same result because query expressions are translated into their lambda expres­sions before they're compiled.这两个产生同样的结果,因为查询表达式被翻译成他们的lambda表达式他们编译之前。

But, if you have paid attention the two queries aren't same at all.Your second query will be translated to it's lambda syntax before it's compiled ( You can remove但是,如果您注意到这两个查询根本不同。您的第二个查询将在编译之前转换为它的 lambda 语法(您可以删除ToList() from query, because it is redundant ):来自查询,因为它是多余的):

pTasks.Where(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending).Count();

And now we have two Linq queries in lambda syntax.现在我们有两个 lambda 语法的 Linq 查询。 The one I have stated above and this:我上面提到的那个和这个:

pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);

Now, the question is:现在,问题是:
Why there is difference between execution times of these two queries?为什么这两个查询的执行时间存在差异?

Let's find the answer:让我们找出答案:
We can understand the reason of this difference by reviewing these:我们可以通过查看这些来理解这种差异的原因:
- .Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate).Count(this IEnumerable<TSource> source) - .Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate).Count(this IEnumerable<TSource> source)
and
- Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate) ; - Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate) ;

Here is the implementation of Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate) :这是Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate)

public static int Count<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null) throw Error.ArgumentNull("source");
    if (predicate == null) throw Error.ArgumentNull("predicate");
    int count = 0;
    foreach (TSource element in source) {
        checked {
            if (predicate(element)) count++;
        }
    }
    return count;
}

And here is the Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate) :这是Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate)

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null) 
        throw Error.ArgumentNull("source");
    if (predicate == null) 
        throw Error.ArgumentNull("predicate");
    if (source is Iterator<TSource>) 
        return ((Iterator<TSource>)source).Where(predicate);
    if (source is TSource[]) 
        return new WhereArrayIterator<TSource>((TSource[])source, predicate);
    if (source is List<TSource>) 
        return new WhereListIterator<TSource>((List<TSource>)source, predicate);
    return new WhereEnumerableIterator<TSource>(source, predicate);
}

Let's pay an attention to Where() implementation.让我们关注Where()实现。 It will return WhereListIterator() if your collection is List, but Count() will just iterate over source.如果您的集合是 List,它将返回WhereListIterator() ,但Count()只会遍历源。 And in my opinion they have made some speed up in the implementation of WhereListIterator .在我看来,他们在WhereListIterator实现方面做了一些加速 And after this we are calling Count() method which takes no predicate as input and only will iterate on filtered collection.在此之后,我们将调用Count()方法,该方法不将谓词作为输入,只会在过滤后的集合上进行迭代。


And regarding to that speed up in the implementation of WhereListIterator :关于WhereListIterator实现的WhereListIterator

I have found this question in SO: LINQ performance Count vs Where and Count .我在 SO: LINQ performance Count vs Where and Count 中发现了这个问题。 You can read @Matthew Watson answer there.您可以在那里阅读@Matthew Watson 的回答 He explains the performance difference between these two queries.他解释了这两个查询之间的性能差异。 And the result is: The Where iterator avoids indirect virtual table call, but calls iterator methods directly.结果是: Where迭代器避免了间接虚表调用,而是直接调用迭代器方法。 As you see in that answer call instruction will be emitted instead of callvirt .正如您在该应答中看到的那样,将发出call指令而不是callvirt And, callvirt is slower than call :而且, callvirtcall慢:

From book CLR via C# :从书CLR via C#

When the callvirt IL instruction is used to call a virtual instance method, the CLR discovers the actual type of the object being used to make the call and then calls the method polymorphically.当 callvirt IL 指令用于调用虚拟实例方法时,CLR 会发现用于进行调用的对象的实际类型,然后以多态方式调用该方法。 In order to determine the type, the variable being used to make the call must not be null.为了确定类型,用于进行调用的变量不能为空。 In other words, when compiling this call, the JIT compiler generates code that verifes that the variable's value is not null.换句话说,在编译此调用时,JIT 编译器会生成验证变量值不为空的代码。 If it is null, the callvirt instruction causes the CLR to throw a NullReferenceException.如果它为 null,则 callvirt 指令会导致 CLR 抛出 NullReferenceException。 This additional check means that the callvirt IL instruction executes slightly more slowly than the call instruction.这个额外的检查意味着 callvirt IL 指令的执行速度比 call 指令稍慢。

Like Farhad said, the implementation of Where(x).Count() and Count(x) vary.就像 Farhad 所说的, Where(x).Count()Count(x)不同的。 The first one instantiates an additional iterator, which on my pc costs about 30.000 ticks (regardless of the collection size)第一个实例化一个额外的迭代器,在我的电脑上花费大约 30.000 个滴答声(不管集合大小)

Also, ToList is not free.此外, ToList不是免费的。 It allocates memory.它分配内存。 It costs time.这需要时间。 On my pc, it roughly doubles execution time.在我的电脑上,它的执行时间大约翻了一番。 (so linear dependent op the collection size) (所以线性依赖于集合大小)

Also, debugging requires spin-up time.此外,调试需要启动时间。 So it's difficult to accurately measure performance in one go.因此,很难一次性准确衡量性能。 I'd recommend a loop like this example.我会推荐一个像这个例子这样的循环。 Then, ignore the first set of results.然后,忽略第一组结果。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var pTasks = Task.GetTasks();
            for (int i = 0; i < 5; i++)
            {

                var s1 = Stopwatch.StartNew();
                var count1 = pTasks.Count(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending);
                s1.Stop();
                Console.WriteLine(s1.ElapsedTicks);

                var s2 = Stopwatch.StartNew();
                var count2 =
                    (
                        from A in pTasks
                        where A.StatusID == (int) BusinessRule.TaskStatus.Pending
                        select A
                        ).ToList().Count();
                s2.Stop();
                Console.WriteLine(s2.ElapsedTicks);

                var s3 = Stopwatch.StartNew();
                var count3 = pTasks.Where(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending).Count();
                s3.Stop();
                Console.WriteLine(s3.ElapsedTicks);


                var s4 = Stopwatch.StartNew();
                var count4 =
                    (
                        from A in pTasks
                        where A.StatusID == (int) BusinessRule.TaskStatus.Pending
                        select A
                        ).Count();
                s4.Stop();
                Console.WriteLine(s4.ElapsedTicks);

                var s5 = Stopwatch.StartNew();
                var count5 = pTasks.Count(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending);
                s5.Stop();
                Console.WriteLine(s5.ElapsedTicks);
                Console.WriteLine();
            }
            Console.ReadLine();
        }
    }

    public class Task
    {
        public static IEnumerable<Task> GetTasks()
        {
            for (int i = 0; i < 10000000; i++)
            {
                yield return new Task { StatusID = i % 3 };
            }
        }

        public int StatusID { get; set; }
    }

    public class BusinessRule
    {
        public enum TaskStatus
        {
            Pending,
            Other
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM