简体   繁体   English

Array.Count()比List.Count()慢得多

[英]Array.Count() much slower than List.Count()

When using the extension method of IEnumerable<T> Count() , an array is at least two times slower than a list. 当使用IEnumerable<T> Count()的扩展方法时,数组至少比列表慢两倍。

Function                      Count()
List<int>                     2,299
int[]                         6,903

From where did the difference comes? 差异来自哪里?

I understand that both are calling the Count property of ICollection : 我知道两者都在调用ICollectionCount属性:

If the type of source implements ICollection, that implementation is used to obtain the count of elements. 如果源的类型实现ICollection,则该实现用于获取元素的数量。 Otherwise, this method determines the count. 否则,此方法确定计数。

For the list it returns List<T>.Count , and for array, Array.Length . 对于列表,它返回List<T>.Count ,对于array, Array.Length Moreover, Array.Length is supposed to be faster than List<T>.Count . 而且, Array.Length应该比List<T>.Count

Benchmark: 基准测试:

class Program
{
    public const long Iterations = (long)1e8;

    static void Main()
    {
        var list = new List<int>(){1};
        var array = new int[1];
        array[0] = 1;

        var results = new Dictionary<string, TimeSpan>();
        results.Add("List<int>", Benchmark(list, Iterations));
        results.Add("int[]", Benchmark(array, Iterations));

        Console.WriteLine("Function".PadRight(30) + "Count()");
        foreach (var result in results)
        {
            Console.WriteLine("{0}{1}", result.Key.PadRight(30), Math.Round(result.Value.TotalSeconds, 3));
        }
        Console.ReadLine();
    }

    public static TimeSpan Benchmark(IEnumerable<int> source, long iterations)
    {
        var countWatch = new Stopwatch();
        countWatch.Start();
        for (long i = 0; i < iterations; i++) source.Count();
        countWatch.Stop();

        return countWatch.Elapsed;
    }
}

Edit: 编辑:

leppie and Knaģis answers are pretty amazing, but I want to add a remark. leppieKnaģis的答案非常惊人,但我想补充一句话。
As Jon Skeet said: 正如Jon Skeet所说:

There are effectively two equivalent blocks, just testing for different collection interface types, and using whichever one it finds first (if any). 实际上有两个等效的块,只是测试不同的集合接口类型,并使用它首先找到的任何一个(如果有的话)。 I don't know whether the .NET implementation tests for ICollection or ICollection< T > first - I could test it by implementing both interfaces but returning different counts from each, of course, but that's probably overkill. 我不知道.NET实现是否首先测试ICollection或ICollection <T> - 我可以通过实现两个接口来测试它,但当然可以从每个接口返回不同的计数,但这可能是过度的。 It doesn't really matter for well-behaved collections other than the slight performance difference - we want to test the "most likely" interface first, which I believe is the generic one. 除了轻微的性能差异之外,对于性能良好的集合并不重要 - 我们希望首先测试“最可能”的接口,我认为这是通用接口。

The generic one could be the most likely to happens, but if you invert the two, ie call the non generic cast before the generic one, Array.Count() becomes a little faster than List.Count(). 通用的可能是最有可能发生的,但如果你反转这两个,即在通用的之前调用非泛型强制转换,Array.Count()变得比List.Count()快一点。 On the other hand, non generic version is slower for List. 另一方面,List的非通用版本较慢。

Good to know if anyone want to call Count() in an 1e8 iterations loop! 很高兴知道是否有人想在1e8迭代循环中调用Count()

Function       ICollection<T> Cast     ICollection Cast
List                1,268                   1,738         
Array               5,925                   1,683

The reason is that Enumerable.Count<T>() performs a cast to ICollection<T> to retrieve the count both from the list and the array. 原因是Enumerable.Count<T>()执行ICollection<T>ICollection<T>以从列表和数组中检索计数。

Using this sample code: 使用此示例代码:

public static int Count<TSource>(IEnumerable<TSource> source)
{
    ICollection<TSource> collection = source as ICollection<TSource>;
    if (collection != null)
    {
        return 1; // collection.Count;
    }
}

you can determine that the cast takes much longer for the array, in fact most of the time taken for counting is from this cast: 你可以确定演员阵容需要花费更长的时间,实际上大部分时间用于计算:

Function                      Count()
List<int>                     1,575
int[]                         5,069

The key might be this statement from the documentation (emphasis mine): 关键可能是文档中的这个陈述(强调我的):

In the .NET Framework version 2.0, the Array class implements the System.Collections.Generic.IList, System.Collections.Generic.ICollection, and System.Collections.Generic.IEnumerable generic interfaces. 在.NET Framework 2.0版中,Array类实现System.Collections.Generic.IList,System.Collections.Generic.ICollection和System.Collections.Generic.IEnumerable通用接口。 The implementations are provided to arrays at run time , and therefore are not visible to the documentation build tools. 这些实现在运行时提供给数组 ,因此文档构建工具不可见。 As a result, the generic interfaces do not appear in the declaration syntax for the Array class, and there are no reference topics for interface members that are accessible only by casting an array to the generic interface type (explicit interface implementations). 因此,通用接口不会出现在Array类的声明语法中,并且没有可通过将数组转换为通用接口类型(显式接口实现)来访问的接口成员的参考主题。

32-bit profiling analysis (all in ms, only interesting bits, JIT inlining disabled): 32位分析分析(全部以ms为单位,仅有趣的位,JIT内联禁用):

Name    Count   'Inc Time'  'Ex Time'   'Avg Inc Time'  'Avg Ex Time'
System.Linq.Enumerable::Count(<UNKNOWN>):int32 <System.Int32>   
        20000000    13338.38    7830.49 0.0007  0.0004
System.SZArrayHelper::get_Count():int32 <System.Int32>  
        10000000    4063.9      2651.44 0.0004  0.0003
System.Collections.Generic.List<System.Int32>::get_Count():int32    
        10000000    1443.99     1443.99 0.0001  0.0001
System.Runtime.CompilerServices.JitHelpers::UnsafeCast(Object):System.__Canon <System.__Canon>  
        10000004    1412.46     1412.46 0.0001  0.0001

System.SZArrayHelper::get_Count() seems to call System.Runtime.CompilerServices.JitHelpers::UnsafeCast for the array case. System.SZArrayHelper::get_Count()似乎为数组大小写调用System.Runtime.CompilerServices.JitHelpers::UnsafeCast

For the list, List<int>.Count simply returns the size. 对于列表, List<int>.Count只返回大小。

Inc time is cost including child calls. Inc time费用包括儿童电话费。 Ex time is cost of method body only. Ex time仅为方法体的成本。

When inlining is disabled, the Array.Count() is twice as slow. 禁用内联时, Array.Count()速度是缓慢的两倍。

It could be due to the fact mentioned the now deleted answer. 这可能是因为提到现在已删除的答案。 It would appear the attributes applied ( ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success) and SecuritySafeCritical ) prevents the runtime from inlining the call, hence the big difference (38 times slower in my case in 32-bit mode). 看起来应用的属性( ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)SecuritySafeCritical )会阻止运行时内联调用,因此差异很大(在我的情况下,在32位模式下慢38倍)。

To profile this yourself: 要自己分析一下:

Get https://github.com/leppie/IronScheme/raw/master/IronScheme/tools/IronScheme.Profiler.x86.dll Run the app (x86 build only) as: 获取https://github.com/leppie/IronScheme/raw/master/IronScheme/tools/IronScheme.Profiler.x86.dll运行应用程序(仅限x86 build):

regsvr32 IronScheme.Profiler.x86.dll
set COR_PROFILER={9E2B38F2-7355-4C61-A54F-434B7AC266C0}
set COR_ENABLE_PROFILING=1
ConsoleApp1.exe

When app exits, a report.tab file is created which can then be used in Excel. 当应用程序退出时,会创建一个report.tab文件,然后可以在Excel中使用该文件。

I'm posting this, not as an answer, but to provide a more testable environment. 我发布这个,不是作为答案,而是提供一个更可测试的环境。

I have taken a copy of the actual implementation of Enumerable<T>.Count() and changed the original test program to use it, so people can single-step it in the debugger. 我已经获取了Enumerable<T>.Count()的实际实现的副本,并更改​​了原始测试程序以使用它,因此人们可以在调试器中单步执行它。

If you run a release version of the code below, you will get similar timings to the OP. 如果您运行以下代码的发布版本,您将获得与OP类似的时间。

For both List<T> and int[] the first cast assigned to is2 will be non-null so is2.Count will be called. 对于List<T>int[] ,分配给is2的第一个is2将为非null,因此将调用is2.Count

So it would appear the difference is coming from the internal implementation of .Count . 所以看起来差异来自.Count的内部实现。

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;

namespace ConsoleApplication1
{
    class Program
    {
        public const long Iterations = (long)1e8;

        static void Main()
        {
            var list = new List<int>() { 1 };
            var array = new int[1];
            array[0] = 1;

            var results = new Dictionary<string, TimeSpan>();
            results.Add("int[]", Benchmark(array, Iterations));
            results.Add("List<int>", Benchmark(list, Iterations));

            Console.WriteLine("Function".PadRight(30) + "Count()");
            foreach (var result in results)
            {
                Console.WriteLine("{0}{1}", result.Key.PadRight(30), Math.Round(result.Value.TotalSeconds, 3));
            }
            Console.ReadLine();
        }

        public static TimeSpan Benchmark(IEnumerable<int> source, long iterations)
        {
            var countWatch = new Stopwatch();
            countWatch.Start();
            for (long i = 0; i < iterations; i++) Count(source);
            countWatch.Stop();

            return countWatch.Elapsed;
        }

        public static int Count<TSource>(IEnumerable<TSource> source)
        {
            ICollection<TSource> is2 = source as ICollection<TSource>;

            if (is2 != null)
                return is2.Count;  // This is executed for int[] AND List<int>.

            ICollection is3 = source as ICollection;

            if (is3 != null)
                return is3.Count;

            int num = 0;

            using (IEnumerator<TSource> enumerator = source.GetEnumerator())
            {
                while (enumerator.MoveNext())
                    num++;
            }

            return num;
        }
    }
}

Given this information, we can simplify the test to just concentrate on the timing differences between List.Count and Array.Count : 鉴于此信息,我们可以简化测试,只关注List.CountArray.Count之间的时序差异:

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main()
        {
            int dummy = 0;
            int count = 1000000000;

            var array = new int[1] as ICollection<int>;
            var list = new List<int> {0};

            var sw = Stopwatch.StartNew();

            for (int i = 0; i < count; ++i)
                dummy += array.Count;

            Console.WriteLine("Array elapsed = " + sw.Elapsed);

            dummy = 0;
            sw.Restart();

            for (int i = 0; i < count; ++i)
                dummy += list.Count;

            Console.WriteLine("List elapsed = " + sw.Elapsed);

            Console.ReadKey(true);
        }
    }
}

The above code gives the following results for a release build run outside the debugger: 上面的代码为调试器外部的发布版本运行提供了以下结果:

Array elapsed = 00:00:02.9586515
List elapsed = 00:00:00.6098578

At this point, I thought to myself "surely we can optimise the Count() to recognise T[] and return .Length directly. So I changed the implementation of Count() as follows: 在这一点上,我自己“当然可以优化Count()来识别T[]并直接返回.Length 。所以我改变了Count()的实现,如下所示:

public static int Count<TSource>(IEnumerable<TSource> source)
{
    var array = source as TSource[];

    if (array != null)        // Optimised for arrays.
        return array.Length;  // This is executed for int[] 

    ICollection<TSource> is2 = source as ICollection<TSource>;

    if (is2 != null)
        return is2.Count;  // This is executed for List<int>.

    ICollection is3 = source as ICollection;

    if (is3 != null)
        return is3.Count;

    int num = 0;

    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
        while (enumerator.MoveNext())
            num++;
    }

    return num;
}

Remarkably, even after making this change, the arrays were still slower on my system, despite the non-arrays having to make the extra cast! 值得注意的是,即使在进行此更改之后,我的系统上的阵列仍然较慢,尽管非阵列必须进行额外的转换!

Results (release build) for me were: 结果(发布版本)对我来说是:

Function                      Count()
List<int>                     1.753
int[]                         2.304

I'm at a total loss to explain this last result... 我完全无法解释这最后的结果......

That is because int[] requires casting, while List<int> does not. 这是因为int[]需要强制转换,而List<int>则不需要。 If you were to use Length property then result will be quite different - approx. 如果你要使用Length属性,那么结果将会大不相同 - 大约。 10x faster than List<int>.Count() . List<int>.Count()快10倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM