简体   繁体   English

Iterating T []的开销被投射到IList <T>

[英]Overhead of Iterating T[] cast to IList<T>

I've noticed a performance hit of iterating over a primitive collection (T[]) that has been cast to a generic interface collection (IList or IEnumberable). 我注意到迭代了原始集合(T [])的性能损失,该集合已经转换为通用接口集合(IList或IEnumberable)。

For example: 例如:

    private static int Sum(int[] array)
    {
        int sum = 0;

        foreach (int i in array)
            sum += i;

        return sum;
    }

The above code executes significantly faster than the code below, where the parameter is changed to type IList (or IEnumerable): 上面的代码执行速度明显快于下面的代码,其中参数更改为类型IList(或IEnumerable):

    private static int Sum(IList<int> array)
    {
        int sum = 0;

        foreach (int i in array)
            sum += i;

        return sum;
    }

The performance hit still occurs if the object passed is a primitive array, and if I try changing the loop to a for loop instead of a foreach loop. 如果传递的对象是一个原始数组,并且我尝试将循环更改为for循环而不是foreach循环,则仍会出现性能损失。

I can get around the performance hit by coding it like such: 我可以通过编码来解决性能问题:

    private static int Sum(IList<int> array)
    {
        int sum = 0;

        if( array is int[] )
            foreach (int i in (int[])array)
                sum += i;
        else
            foreach (int i in array)
                sum += i;

        return sum;
    }

Is there a more elegant way of solving this issue? 有没有更优雅的方法来解决这个问题? Thank you for your time. 感谢您的时间。

Edit: My benchmark code: 编辑:我的基准代码:

    static void Main(string[] args)
    {
        int[] values = Enumerable.Range(0, 10000000).ToArray<int>();
        Stopwatch sw = new Stopwatch();

        sw.Start();
        Sum(values);
        //Sum((IList<int>)values);
        sw.Stop();

        Console.WriteLine("Elasped: {0} ms", sw.ElapsedMilliseconds);
        Console.Read();
    }

Your best bet is to create overload for Sum with int[] as argument if this method is performance-critical. 如果此方法对性能至关重要,最好的办法是使用int[]作为参数创建Sum重载。 CLR's JIT can detect foreach -style iteration over array and can skip range checking and address each element directly. CLR的JIT可以检测数组上的foreach样式迭代,并且可以跳过范围检查并直接寻址每个元素。 Each iteration of loop takes 3-5 instructions on x86, with only one memory lookup. 循环的每次迭代在x86上需要3-5条指令,只有一次内存查找。

When using IList, JIT does not have knowledge about underlying collection's structure and ends up using IEnumerator<int> . 使用IList时,JIT不了解底层集合的结构,最终使用IEnumerator<int> Each iteration of loop uses two interface invocation - one for Current , one for MoveNext (2-3 memory lookups and a call for each of those). 每次循环迭代都使用两个接口调用 - 一个用于Current ,一个用于MoveNext (2-3个内存查找和每个调用)。 This most likely ends up with ~20 quite expensive instructions and there is not much you can do about it. 这最有可能导致约20个非常昂贵的指令,你可以做的很少。

Edit: If you are curious about actual machine code emitted by JIT from Release build without debugger attached, here it is. 编辑:如果您对附带调试器的发布版本中JIT发出的实际机器代码感到好奇,请点击此处。

Array version 阵列版本

            int s = 0;
00000000  push        ebp  
00000001  mov         ebp,esp 
00000003  push        edi  
00000004  push        esi  
00000005  xor         esi,esi 
            foreach (int i in arg)
00000007  xor         edx,edx 
00000009  mov         edi,dword ptr [ecx+4] 
0000000c  test        edi,edi 
0000000e  jle         0000001B 
00000010  mov         eax,dword ptr [ecx+edx*4+8] 
                s += i;
00000014  add         esi,eax 
00000016  inc         edx  
            foreach (int i in arg)
00000017  cmp         edi,edx 
00000019  jg          00000010 

IEnumerable version IEnumerable版本

            int s = 0;
00000000  push        ebp  
00000001  mov         ebp,esp 
00000003  push        edi  
00000004  push        esi  
00000005  push        ebx  
00000006  sub         esp,1Ch 
00000009  mov         esi,ecx 
0000000b  lea         edi,[ebp-28h] 
0000000e  mov         ecx,6 
00000013  xor         eax,eax 
00000015  rep stos    dword ptr es:[edi] 
00000017  mov         ecx,esi 
00000019  xor         eax,eax 
0000001b  mov         dword ptr [ebp-18h],eax 
0000001e  xor         edx,edx 
00000020  mov         dword ptr [ebp-24h],edx 
            foreach (int i in arg)
00000023  call        dword ptr ds:[009E0010h] 
00000029  mov         dword ptr [ebp-28h],eax 
0000002c  mov         ecx,dword ptr [ebp-28h] 
0000002f  call        dword ptr ds:[009E0090h] 
00000035  test        eax,eax 
00000037  je          00000052 
00000039  mov         ecx,dword ptr [ebp-28h] 
0000003c  call        dword ptr ds:[009E0110h] 
                s += i;
00000042  add         dword ptr [ebp-24h],eax 
            foreach (int i in arg)
00000045  mov         ecx,dword ptr [ebp-28h] 
00000048  call        dword ptr ds:[009E0090h] 
0000004e  test        eax,eax 
00000050  jne         00000039 
00000052  mov         dword ptr [ebp-1Ch],0 
00000059  mov         dword ptr [ebp-18h],0FCh 
00000060  push        0F403BCh 
00000065  jmp         00000067 
00000067  cmp         dword ptr [ebp-28h],0 
0000006b  je          00000076 
0000006d  mov         ecx,dword ptr [ebp-28h] 
00000070  call        dword ptr ds:[009E0190h] 

Welcome to optimization. 欢迎优化。 Things aren't always obvious here! 事情并不总是显而易见的!

Basically, as you've found, when the compiler detects that certain types of safety constraints are proven to hold , it can issue enormously more efficient code when doing full optimization. 基本上,正如您所发现的,当编译器检测到某些类型的安全约束被证明有效时,它可以在进行完全优化时发出极其高效的代码。 Here (as MagnatLU shows) we see that knowing you've got an array allows all sorts of assumptions to be made about the size being fixed, and it allows memory to be accessed directly (which is also maximally efficient in how it integrates with the CPU's memory prefetch code, for bonus speed). 在这里(如MagnatLU所示)我们看到知道你有一个数组允许对固定的大小做出各种假设,并且它允许直接访问内存(这也是它与如何集成的最大效率) CPU的内存预取代码,用于奖励速度)。 When the compiler doesn't have the proof that it can generate super-fast code, it plays it safe. 当编译器没有证明它可以生成超快速代码时,它就可以安全地运行它。 (This is the right thing to do.) (这是正确的做法。)

As a general comment, your workaround code is pretty simple when it comes to coding for optimization (when making the code super-readable and maintainable isn't always the first consideration). 作为一般性评论,您的解决方法代码在编码优化时非常简单(当使代码超级可读和可维护时并不总是首要考虑因素)。 I don't really see how you could better it without making your class's API more complex (not a win!). 如果不让你的课程的API变得更复杂(不是胜利!),我真的不知道如何改进它。 Moreover, just adding a comment inside the body to say why you've done this would solve the maintenance issue; 此外,只需在正文中添加注释,说明为什么要这样做就可以解决维护问题; this is in fact one of the best uses for (non-doc) comments in the code in the first place. 事实上,这首先是代码中(非doc)注释的最佳用途之一。 Given that the use case is large arrays (ie, that it's reasonable to optimize at all in the first place) I'd say you have a great solution right there. 鉴于用例是大型数组(即首先完全优化是合理的)我会说你有一个很好的解决方案。

As an alternative to @MagnatLU's answer above, you can use for instead of foreach and cache the list's Count . 作为@ MagnatLU上面的答案的替代方案,您可以使用for而不是foreach并缓存列表的Count There is still overhead when compared to int[] but not quite as much: the IList<int> overload duration decreased by ~50% using your test code on my machine. int[]相比仍有开销,但不是很多:使用我的机器上的测试代码, IList<int>过载持续时间减少了约50%。

private static int Sum(IList<int> array)
{
    int sum = 0;

    int count = array.Count;
    for (int i = 0; i < count; i++)
        sum += array[i];

    return sum;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM