简体   繁体   English

Linq表现:我应该首先使用`where`或`select`

[英]Linq performance: should I first use `where` or `select`

I have a large List in memory, from a class that has about 20 properties . 我在内存中有一个大的List ,来自一个有大约20个properties

I'd like to filter this list based on just one property , for a particular task I only need a list of that property . 我想基于一个property过滤此列表,对于特定任务,我只需要该property的列表。 So my query is something like: 所以我的查询是这样的:

data.Select(x => x.field).Where(x => x == "desired value").ToList()

Which one gives me a better performance, using Select first, or using Where ? 哪一个让我有更好的表现,先使用Select ,还是使用Where

data.Where(x => x.field == "desired value").Select(x => x.field).ToList()

Please let me know if this is related to the data type I'm keeping the data in memory, or field's type. 如果这与我将数据保存在内存中的data type或字段类型有关,请告诉我。 Please note that I need these objects for other tasks too, so I can't filter them in the first place and before loading them into memory. 请注意,我也需要这些对象用于其他任务,因此我无法在首先过滤它们并将它们加载到内存之前。

Which one gives me a better performance, using Select first, or using Where. 哪一个让我有更好的表现,先使用Select,或使用Where。

Where first approach is more performant, since it filters your collection first, and then executes Select for filtered values only. Where第一种方法是更好的性能,因为它首先过滤你的集合,然后执行Select筛选值。

Mathematically speaking, Where -first approach takes N + N' operations, where N' is the number of collection items which fall under your Where condition. 从数学上讲, Where first方法采用N + N'运算,其中N'是属于Where条件的集合项数。
So, it takes N + 0 = N operations at minimum (if no items pass this Where condition) and N + N = 2 * N operations at maximum (if all items pass the condition). 因此,最少需要N + 0 = N操作(如果没有项目通过此Where条件),并且最多N + N = 2 * N操作(如果所有项目都通过了条件)。

At the same time, Select first approach will always take exactly 2 * N operations, since it iterates through all objects to acquire the property, and then iterates through all objects to filter them. 同时, Select第一种方法将始终采用正好2 * N操作,因为它遍历所有对象以获取属性,然后遍历所有对象以过滤它们。

Benchmark proof 基准证明

I have completed the benchmark to prove my answer. 我已完成基准测试以证明我的答案。

Results: 结果:

Condition value: 50
Where -> Select: 88 ms, 10500319 hits
Select -> Where: 137 ms, 20000000 hits

Condition value: 500
Where -> Select: 187 ms, 14999212 hits
Select -> Where: 238 ms, 20000000 hits

Condition value: 950
Where -> Select: 186 ms, 19500126 hits
Select -> Where: 402 ms, 20000000 hits

If you run the benchmark many times, then you will see that Where -> Select approach hits change from time to time, while Select -> Where approach always takes 2N operations. 如果您多次运行基准测试,那么您将看到Where -> Select方法命中时间变化,而Select -> Where方法总是需要2N操作。

IDEOne demonstration: IDEOne演示:

https://ideone.com/jwZJLt https://ideone.com/jwZJLt

Code: 码:

class Point
{
    public int X { get; set; }
    public int Y { get; set; }
}

class Program
{
    static void Main()
    {
        var random = new Random();
        List<Point> points = Enumerable.Range(0, 10000000).Select(x => new Point { X = random.Next(1000), Y = random.Next(1000) }).ToList();

        int conditionValue = 250;
        Console.WriteLine($"Condition value: {conditionValue}");

        Stopwatch sw = new Stopwatch();
        sw.Start();

        int hitCount1 = 0;
        var points1 = points.Where(x =>
        {
            hitCount1++;
            return x.X < conditionValue;
        }).Select(x =>
        {
            hitCount1++;
            return x.Y;
        }).ToArray();

        sw.Stop();
        Console.WriteLine($"Where -> Select: {sw.ElapsedMilliseconds} ms, {hitCount1} hits");

        sw.Restart();

        int hitCount2 = 0;
        var points2 = points.Select(x =>
        {
            hitCount2++;
            return x.Y;
        }).Where(x =>
        {
            hitCount2++;
            return x < conditionValue;
        }).ToArray();

        sw.Stop();
        Console.WriteLine($"Select -> Where: {sw.ElapsedMilliseconds} ms, {hitCount2} hits");

        Console.ReadLine();
    }
}

Related questions 相关问题

These questions can also be interesting to you. 这些问题对您来说也很有趣。 They are not related to Select and Where , but they are about LINQ order performance: 它们与SelectWhere无关,但它们与LINQ订单性能有关:

Does the order of LINQ functions matter? LINQ函数的顺序是否重要?
Order of LINQ extension methods does not affect performance? LINQ扩展方法的顺序不影响性能?

The answer will depend on the state of your collection. 答案取决于您的收藏状态。

  • If most entities will pass the Where test, apply Select first; 如果大多数实体将通过Where测试,请先选择Select ;
  • If fewer entities will pass the Where test, apply Where first. 如果较少的实体将通过Where测试,请先应用Where

Update: 更新:

@YeldarKurmangaliyev wrote the answer with a concrete example and benchmarking. @YeldarKurmangaliyev用一个具体的例子和基准测试写了答案。 I ran similar code to verify his claim and our results are exactly opposite and that is because I ran the same test as his but with an object not as simple as the Point type he used to run his tests. 我运行类似的代码来验证他的声明,我们的结果正好相反 ,这是因为我运行了与他相同的测试,但是对象并不像他用来运行测试的Point类型那么简单。

The code very much looks like his code, except that I changed the name of class from Point to EnumerableClass . 代码非常类似于他的代码,除了我将类的名称从Point更改为EnumerableClass

Given below the classes I used to constitute the EnumerableClass class: 下面给出了我用来构成EnumerableClass类的类:

public class EnumerableClass
{
    public int X { get; set; }
    public int Y { get; set; }
    public String A { get; set; }
    public String B { get; set; }
    public String C { get; set; }
    public String D { get; set; }
    public String E { get; set; }
    public Frame F { get; set; }
    public Gatorade Gatorade { get; set; }
    public Home Home { get; set; }
}

public class Home
{
    private Home(int rooms, double bathrooms, Stove stove, InternetConnection internetConnection)
    {
        Rooms = rooms;
        Bathrooms = (decimal) bathrooms;
        StoveType = stove;
        Internet = internetConnection;
    }

    public int Rooms { get; set; }
    public decimal Bathrooms { get; set; }
    public Stove StoveType { get; set; }
    public InternetConnection Internet { get; set; }

    public static Home GetUnitOfHome()
    {
        return new Home(5, 2.5, Stove.Gas, InternetConnection.Att);
    }
}

public enum InternetConnection
{
    Comcast = 0,
    Verizon = 1,
    Att = 2,
    Google = 3
}

public enum Stove
{
    Gas = 0,
    Electric = 1,
    Induction = 2
}

public class Gatorade
{
    private Gatorade(int volume, Color liquidColor, int bottleSize)
    {
        Volume = volume;
        LiquidColor = liquidColor;
        BottleSize = bottleSize;
    }

    public int Volume { get; set; }
    public Color LiquidColor { get; set; }
    public int BottleSize { get; set; }

    public static Gatorade GetGatoradeBottle()
    {
        return new Gatorade(100, Color.Orange, 150);
    }
}

public class Frame
{
    public int X { get; set; }
    public int Y { get; set; }

    private Frame(int x, int y)
    {
        X = x;
        Y = y;
    }

    public static Frame GetFrame()
    {
        return new Frame(5, 10);
    }
}

The classes Frame , Gatorade and Home have a static method each to return an instance of their type. FrameGatoradeHome都有一个静态方法,每个类都返回一个类型的实例。

Below is the main program: 以下是主要方案:

public static class Program
{
    const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    private static readonly Random Random = new Random();

    private static string RandomString(int length)
    {
        return new string(Enumerable.Repeat(Chars, length)
            .Select(s => s[Random.Next(s.Length)]).ToArray());
    }

    private static void Main()
    {
        var random = new Random();
        var largeCollection =
            Enumerable.Range(0, 1000000)
                .Select(
                    x =>
                        new EnumerableClass
                        {
                            A = RandomString(500),
                            B = RandomString(1000),
                            C = RandomString(100),
                            D = RandomString(256),
                            E = RandomString(1024),
                            F = Frame.GetFrame(),
                            Gatorade = Gatorade.GetGatoradeBottle(),
                            Home = Home.GetUnitOfHome(),
                            X = random.Next(1000),
                            Y = random.Next(1000)
                        })
                .ToList();

        const int conditionValue = 250;
        Console.WriteLine(@"Condition value: {0}", conditionValue);

        var sw = new Stopwatch();
        sw.Start();
        var firstWhere = largeCollection
            .Where(x => x.Y < conditionValue)
            .Select(x => x.Y)
            .ToArray();
        sw.Stop();
        Console.WriteLine(@"Where -> Select: {0} ms", sw.ElapsedMilliseconds);

        sw.Restart();
        var firstSelect = largeCollection
            .Select(x => x.Y)
            .Where(y => y < conditionValue)
            .ToArray();
        sw.Stop();
        Console.WriteLine(@"Select -> Where: {0} ms", sw.ElapsedMilliseconds);
        Console.ReadLine();

        Console.WriteLine();
        Console.WriteLine(@"First Where's first item: {0}", firstWhere.FirstOrDefault());
        Console.WriteLine(@"First Select's first item: {0}", firstSelect.FirstOrDefault());
        Console.WriteLine();
        Console.ReadLine();
    }
}

Results: 结果:

I ran the tests multiple times and found that 我多次运行测试并发现了

.Select().Where() performed better than .Where().Select(). .Select()。Where()的 表现优于 .Where()。选择()。

when collection size is 1000000. 当集合大小为1000000时。


Here is the first test result where I forced every EnumerableClass object's Y value to be 5, so every item passed Where : 这里是我每天强迫第1测试结果EnumerableClass对象Y值为5,所以每个项目通过其中

Condition value: 250
Where -> Select: 149 ms
Select -> Where: 115 ms

First Where's first item: 5
First Select's first item: 5

Here is the second test result where I forced every EnumerableClass object's Y value to be 251, so no item passed Where : 这是第二个测试结果,我强制每个EnumerableClass对象的Y值为251,所以没有项目传递到哪里

Condition value: 250
Where -> Select: 110 ms
Select -> Where: 100 ms

First Where's first item: 0
First Select's first item: 0

Clearly, the result is so dependent on the state of the collection that : 显然, 结果如此依赖于集合的状态

  • In @YeldarKurmangaliyev's tests .Where().Select() performed better; 在@ YeldarKurmangaliyev的测试中.Where()。Select()表现更好; and, 和,
  • In my tests .Select().Where() performed better. 在我的测试中.Select()。Where()表现得更好。

The state of the collection, which I am mentioning over and over includes: 我一遍又一遍地提到的收集状态包括:

  • the size of each item; 每个项目的大小;
  • the total number of items in the collection; 集合中的项目总数; and, 和,
  • the number of items likely to pass the Where clause. 可能通过Where子句的项目数。

Response to comments on the answer: 回复对答案的评论:

Further, @Enigmativity said that knowing ahead of time the result of Where in order to know whether to put Where first or Select first is a Catch-22. 此外,@ Enigmativity表示提前知道Where的结果,以便知道是先放置Where还是先选择Select是Catch-22。 Ideally and theoretically, he is correct and not surprisingly, this situation is seen in another domain of Computer Science - Scheduling . 从理论上和理论上来说,他是正确的,并不奇怪,这种情况可以在计算机科学的另一个领域 - 调度中看到。

The best scheduling algorithm is Shortest Job First where we schedule that job first that will execute for the least time. 最好的调度算法是Shortest Job First ,我们首先安排那个将在最短时间内执行的作业。 But, how would anyone know how much time will a particular job take to complete? 但是,怎么会有人知道特定工作需要多长时间才能完成? Well, the answer is that: 嗯,答案是:

Shortest job next is used in specialized environments where accurate estimates of running time are available. 下一个最短的工作用于可以准确估计运行时间的专业环境。

Therefore, as I said right at the top (which was also the first, shorter version of my answer), the correct answer to this question will depend on the current state of the collection . 因此,正如我在顶部所说(这也是我的答案的第一个,更短的版本),这个问题的正确答案将取决于集合当前状态

In general, 一般来说,

  • if your objects are within a reasonable size range; 如果您的物品在合理的尺寸范围内; and, 和,
  • you are Select ing a very small chunk out of each object; 你是从每个对象中选择一个非常小的块; and, 和,
  • your collection size is also not just in thousands, 你的收藏规模也不仅仅是千元,

then the guideline mentioned right at the top of this answer will be useful for you. 然后,在这个答案的顶部提到的指南将对您有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM