[英]Overhead of Iterating T[] cast to IList<T>
I've noticed a performance hit of iterating over a primitive collection (T[]) that has been cast to a generic interface collection (IList or IEnumberable). 我注意到迭代了原始集合(T [])的性能损失,该集合已经转换为通用接口集合(IList或IEnumberable)。
For example: 例如:
private static int Sum(int[] array)
{
int sum = 0;
foreach (int i in array)
sum += i;
return sum;
}
The above code executes significantly faster than the code below, where the parameter is changed to type IList (or IEnumerable): 上面的代码执行速度明显快于下面的代码,其中参数更改为类型IList(或IEnumerable):
private static int Sum(IList<int> array)
{
int sum = 0;
foreach (int i in array)
sum += i;
return sum;
}
The performance hit still occurs if the object passed is a primitive array, and if I try changing the loop to a for loop instead of a foreach loop. 如果传递的对象是一个原始数组,并且我尝试将循环更改为for循环而不是foreach循环,则仍会出现性能损失。
I can get around the performance hit by coding it like such: 我可以通过编码来解决性能问题:
private static int Sum(IList<int> array)
{
int sum = 0;
if( array is int[] )
foreach (int i in (int[])array)
sum += i;
else
foreach (int i in array)
sum += i;
return sum;
}
Is there a more elegant way of solving this issue? 有没有更优雅的方法来解决这个问题? Thank you for your time.
感谢您的时间。
Edit: My benchmark code: 编辑:我的基准代码:
static void Main(string[] args)
{
int[] values = Enumerable.Range(0, 10000000).ToArray<int>();
Stopwatch sw = new Stopwatch();
sw.Start();
Sum(values);
//Sum((IList<int>)values);
sw.Stop();
Console.WriteLine("Elasped: {0} ms", sw.ElapsedMilliseconds);
Console.Read();
}
Your best bet is to create overload for Sum
with int[]
as argument if this method is performance-critical. 如果此方法对性能至关重要,最好的办法是使用
int[]
作为参数创建Sum
重载。 CLR's JIT can detect foreach -style iteration over array and can skip range checking and address each element directly. CLR的JIT可以检测数组上的foreach样式迭代,并且可以跳过范围检查并直接寻址每个元素。 Each iteration of loop takes 3-5 instructions on x86, with only one memory lookup.
循环的每次迭代在x86上需要3-5条指令,只有一次内存查找。
When using IList, JIT does not have knowledge about underlying collection's structure and ends up using IEnumerator<int>
. 使用IList时,JIT不了解底层集合的结构,最终使用
IEnumerator<int>
。 Each iteration of loop uses two interface invocation - one for Current
, one for MoveNext
(2-3 memory lookups and a call for each of those). 每次循环迭代都使用两个接口调用 - 一个用于
Current
,一个用于MoveNext
(2-3个内存查找和每个调用)。 This most likely ends up with ~20 quite expensive instructions and there is not much you can do about it. 这最有可能导致约20个非常昂贵的指令,你可以做的很少。
Edit: If you are curious about actual machine code emitted by JIT from Release build without debugger attached, here it is. 编辑:如果您对未附带调试器的发布版本中JIT发出的实际机器代码感到好奇,请点击此处。
int s = 0;
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 xor esi,esi
foreach (int i in arg)
00000007 xor edx,edx
00000009 mov edi,dword ptr [ecx+4]
0000000c test edi,edi
0000000e jle 0000001B
00000010 mov eax,dword ptr [ecx+edx*4+8]
s += i;
00000014 add esi,eax
00000016 inc edx
foreach (int i in arg)
00000017 cmp edi,edx
00000019 jg 00000010
int s = 0;
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 push ebx
00000006 sub esp,1Ch
00000009 mov esi,ecx
0000000b lea edi,[ebp-28h]
0000000e mov ecx,6
00000013 xor eax,eax
00000015 rep stos dword ptr es:[edi]
00000017 mov ecx,esi
00000019 xor eax,eax
0000001b mov dword ptr [ebp-18h],eax
0000001e xor edx,edx
00000020 mov dword ptr [ebp-24h],edx
foreach (int i in arg)
00000023 call dword ptr ds:[009E0010h]
00000029 mov dword ptr [ebp-28h],eax
0000002c mov ecx,dword ptr [ebp-28h]
0000002f call dword ptr ds:[009E0090h]
00000035 test eax,eax
00000037 je 00000052
00000039 mov ecx,dword ptr [ebp-28h]
0000003c call dword ptr ds:[009E0110h]
s += i;
00000042 add dword ptr [ebp-24h],eax
foreach (int i in arg)
00000045 mov ecx,dword ptr [ebp-28h]
00000048 call dword ptr ds:[009E0090h]
0000004e test eax,eax
00000050 jne 00000039
00000052 mov dword ptr [ebp-1Ch],0
00000059 mov dword ptr [ebp-18h],0FCh
00000060 push 0F403BCh
00000065 jmp 00000067
00000067 cmp dword ptr [ebp-28h],0
0000006b je 00000076
0000006d mov ecx,dword ptr [ebp-28h]
00000070 call dword ptr ds:[009E0190h]
Welcome to optimization. 欢迎优化。 Things aren't always obvious here!
事情并不总是显而易见的!
Basically, as you've found, when the compiler detects that certain types of safety constraints are proven to hold , it can issue enormously more efficient code when doing full optimization. 基本上,正如您所发现的,当编译器检测到某些类型的安全约束被证明有效时,它可以在进行完全优化时发出极其高效的代码。 Here (as MagnatLU shows) we see that knowing you've got an array allows all sorts of assumptions to be made about the size being fixed, and it allows memory to be accessed directly (which is also maximally efficient in how it integrates with the CPU's memory prefetch code, for bonus speed).
在这里(如MagnatLU所示)我们看到知道你有一个数组允许对固定的大小做出各种假设,并且它允许直接访问内存(这也是它与如何集成的最大效率) CPU的内存预取代码,用于奖励速度)。 When the compiler doesn't have the proof that it can generate super-fast code, it plays it safe.
当编译器没有证明它可以生成超快速代码时,它就可以安全地运行它。 (This is the right thing to do.)
(这是正确的做法。)
As a general comment, your workaround code is pretty simple when it comes to coding for optimization (when making the code super-readable and maintainable isn't always the first consideration). 作为一般性评论,您的解决方法代码在编码优化时非常简单(当使代码超级可读和可维护时并不总是首要考虑因素)。 I don't really see how you could better it without making your class's API more complex (not a win!).
如果不让你的课程的API变得更复杂(不是胜利!),我真的不知道如何改进它。 Moreover, just adding a comment inside the body to say why you've done this would solve the maintenance issue;
此外,只需在正文中添加注释,说明为什么要这样做就可以解决维护问题; this is in fact one of the best uses for (non-doc) comments in the code in the first place.
事实上,这首先是代码中(非doc)注释的最佳用途之一。 Given that the use case is large arrays (ie, that it's reasonable to optimize at all in the first place) I'd say you have a great solution right there.
鉴于用例是大型数组(即首先完全优化是合理的)我会说你有一个很好的解决方案。
As an alternative to @MagnatLU's answer above, you can use for
instead of foreach
and cache the list's Count
. 作为@ MagnatLU上面的答案的替代方案,您可以使用
for
而不是foreach
并缓存列表的Count
。 There is still overhead when compared to int[]
but not quite as much: the IList<int>
overload duration decreased by ~50% using your test code on my machine. 与
int[]
相比仍有开销,但不是很多:使用我的机器上的测试代码, IList<int>
过载持续时间减少了约50%。
private static int Sum(IList<int> array)
{
int sum = 0;
int count = array.Count;
for (int i = 0; i < count; i++)
sum += array[i];
return sum;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.