简体   繁体   English

在linq中为什么后续调用IEnumerable.Intersect这么快

[英]in linq why are subsequent calls of IEnumerable.Intersect so much faster

while looking at this question C# Similarities of two arrays it was noted that the initial linq call was significantly slower than subsequent calls. 在看这个问题C#两个数组的相似性时,注意到初始linq调用明显慢于后续调用。 What is being cached that is making such a difference? 什么是缓存,正在产生这样的差异? I am interested in when we can expect to achieve this type of behavior (perhaps here it is simply because the same lists are used over and over). 我感兴趣的是什么时候我们可以期望实现这种类型的行为(也许这只是因为相同的列表被反复使用)。

    static void Main(string[] args)
    {
        var a = new List<int>() { 7, 17, 21, 29, 30, 33, 40, 42, 51, 53, 60, 63, 66, 68, 70, 84, 85, 91, 101, 102, 104, 108, 109, 112, 115, 116, 118, 125, 132, 137, 139, 142, 155, 163, 164, 172, 174, 176, 179, 184, 185, 186, 187, 188, 189, 192, 197, 206, 209, 234, 240, 244, 249, 250, 252, 253, 254, 261, 263, 270, 275, 277, 290, 292, 293, 304, 308, 310, 314, 316, 319, 321, 322, 325, 326, 327, 331, 332, 333, 340, 367, 371, 374, 403, 411, 422, 427, 436, 440, 443, 444, 446, 448, 449, 450, 452, 455, 459, 467, 470, 487, 488, 489, 492, 494, 502, 503, 505, 513, 514, 522, 523, 528, 532, 534, 535, 545, 547, 548, 553, 555, 556, 565, 568, 570, 577, 581, 593, 595, 596, 598, 599, 606, 608, 613, 615, 630, 638, 648, 661, 663, 665, 669, 673, 679, 681, 685, 687, 690, 697, 702, 705, 708, 710, 716, 719, 724, 725, 727, 728, 732, 733, 739, 744, 760, 762, 775, 781, 787, 788, 790, 795, 797, 802, 806, 808, 811, 818, 821, 822, 829, 835, 845, 848, 851, 859, 864, 866, 868, 875, 881, 898, 899, 906, 909, 912, 913, 915, 916, 920, 926, 929, 930, 933, 937, 945, 946, 949, 954, 957, 960, 968, 975, 980, 985, 987, 989, 995 };
        var b = new List<int>() { 14, 20, 22, 23, 32, 36, 40, 48, 63, 65, 67, 71, 83, 87, 90, 100, 104, 109, 111, 127, 128, 137, 139, 141, 143, 148, 152, 153, 157, 158, 161, 163, 166, 187, 192, 198, 210, 211, 217, 220, 221, 232, 233, 236, 251, 252, 254, 256, 257, 272, 273, 277, 278, 283, 292, 304, 305, 307, 321, 333, 336, 341, 342, 344, 349, 355, 356, 359, 366, 373, 379, 386, 387, 392, 394, 396, 401, 409, 412, 433, 437, 441, 445, 447, 452, 465, 471, 476, 479, 483, 511, 514, 516, 521, 523, 531, 544, 548, 551, 554, 559, 562, 566, 567, 571, 572, 574, 576, 586, 592, 593, 597, 600, 602, 615, 627, 631, 636, 644, 650, 655, 657, 660, 667, 670, 680, 691, 697, 699, 703, 704, 706, 707, 716, 742, 748, 751, 754, 766, 770, 779, 785, 788, 790, 802, 803, 806, 811, 812, 815, 816, 821, 824, 828, 841, 848, 853, 863, 866, 870, 872, 875, 879, 880, 882, 883, 885, 886, 887, 888, 892, 894, 902, 905, 909, 912, 913, 914, 916, 920, 922, 925, 926, 928, 930, 935, 936, 938, 942, 945, 952, 954, 955, 957, 959, 960, 961, 963, 970, 974, 976, 979, 987 };
        var s = new System.Diagnostics.Stopwatch();
        const int cycles = 10;
        for (int i = 0; i < cycles; i++)
        {
            s.Start();
            var z= a.Intersect(b);
            s.Stop();
            Console.WriteLine("Test 1-{0}: {1} {2}", i, s.ElapsedTicks, z.Count());
            s.Reset();
            a[0]=i;//simple attempt to make sure entire result isn't cached
        }

        for (int i = 0; i < cycles; i++)
        {
            var z1 = new List<int>(a.Count);
            s.Start();
            int j = 0;
            int b1 = b[j];
            foreach (var a1 in a)
            {
                while (b1 <= a1)
                {
                    if (b1 == a1)
                        z1.Add(b[j]);
                    j++;
                    if (j >= b.Count)
                        break;
                    b1 = b[j];
                }
            }
            s.Stop();
            Console.WriteLine("Test 2-{0}: {1} {2}", i, s.ElapsedTicks, z1.Count);
            s.Reset();
            a[0]=i;//simple attempt to make sure entire result isn't cached
        }

        Console.Write("Press Enter to quit");
        Console.ReadLine();
    }
}

as requested by some - example output: 根据某些要求 - 示例输出:

Test 1-0: 2900 45
Test 1-1: 2 45
Test 1-2: 0 45
Test 1-3: 1 45

(the normal loop shows only a slight difference between consecutive runs) (正常循环显示连续运行之间只有轻微差异)

note after alterations to call a.Intersect(b).ToArray(); 改变之后注意调用a.Intersect(b).ToArray(); rather than just a.Intersect(b); 而不just a.Intersect(b); as suggested by @kerem the results become: 正如@kerem所建议的那样,结果变为:

Test 1-0: 13656 45
Test 1-1: 113 45
Test 1-2: 76 45
Test 1-3: 64 45
Test 1-4: 90 45 
...

I would expect the first run of any loop to be slower for three reasons: 我希望任何循环的第一次运行都会变慢,原因有三:

  1. Code has to be jitted the first time, but not subsequently. 代码必须在第一次进行,但不能随后进行。
  2. If the executable code run is small enough to fit in cache, then it won't have been evicted, and be faster for the CPU to load. 如果运行的可执行代码足够小以适应缓存,则它不会被驱逐,并且加载CPU的速度更快。
  3. If the data is small enough to fit in cache, then it won't have been evicted, and be faster for CPU to load. 如果数据足够小以适应缓存,则它不会被驱逐,并且加载CPU的速度更快。

LINQ heavily uses deferred execution. LINQ大量使用延迟执行。 Unless you enumerate a query, it does not get executed. 除非您枚举查询,否则它不会被执行。

Change 更改

 s.Start();
 z= a.Intersect(b);
 s.Stop();

to

 s.Start();
 z= a.Intersect(b).**ToArray**();
 s.Stop();

and please post the new performance results. 并请发布新的绩效结果。

a.Intersect(b) represents an expression, independent of the value of a and b. a.Intersect(b)表示一个表达式,与a和b的值无关。 Values of a and b are only used when the expression is evaluated by enumeration. 仅当通过枚举计算表达式时,才使用a和b的值。

JITting System.Enumerable. JITting System.Enumerable。

Put new List().Intersect(new List()); 把新的List()。相交(new List()); new System.Diagnostics.Stopwatch().Stop(); new System.Diagnostics.Stopwatch()。Stop(); as your first line of code and all interations will take the same amount of time. 作为您的第一行代码和所有交互将花费相同的时间。

Enumerable.Intersect does not do any caching. Enumerable.Intersect不执行任何缓存。 It is implemented using a HashSet . 它是使用HashSet实现的。 The first sequence is added to the HashSet . 第一个序列被添加到HashSet Then the second sequence is removed from the HashSet . 然后从HashSet删除第二个序列。 The remaining elements in the HashSet is yielded as an enumerable sequence of elements. HashSet的其余元素作为可枚举的元素序列产生。 You will have to actually enumerate the HashSet to pay the cost of creating the HashSet . 您必须实际枚举HashSet以支付创建HashSet的成本。 This implementation is suprisingly efficient even for small collections. 即使对于小型集合,这种实现也令人惊讶地高效。

If you see a difference in performance in subsequent calls it is not because Enumerable.Intersect does any caching but probably because you need to "warm up" your benchmark. 如果你在后续调用中看到性能上的差异,那不是因为Enumerable.Intersect执行任何缓存,但可能是因为你需要“预热”你的基准测试。

You are enumerating the result of Intersect() only when you call Count() ; 只有在调用Count()时,才会枚举Intersect()的结果; that's when the calculation of the intersection actually occurs. 当交叉点的计算实际发生时。 The part you're timing is the creation of the enumerable object that represents the future calculation of the intersection . 您正在计时的部分是创建可枚举对象, 对象表示交集的未来计算

In addition to the jitting penalty others have noted, the first call to Intersect() might be the first use of a type from System.Core.dll, so you might be looking at the time required to load the IL code into memory, as well. 除了其他人注意到的jitting惩罚之外,第一次调用Intersect()可能是第一次使用System.Core.dll中的类型,因此您可能正在查看将IL代码加载到内存中所需的时间,如好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM