简体   繁体   English

名单 <T> .AddRange实现次优

[英]List<T>.AddRange implementation suboptimal

Profiling my C# application indicated that significant time is spent in List<T>.AddRange . 对我的C#应用​​程序进行概要分析表明,在List<T>.AddRange花费了大量时间。 Using Reflector to look at the code in this method indicated that it calls List<T>.InsertRange which is implemented as such: 使用Reflector查看此方法中的代码表明它调用了List<T>.InsertRange ,它实现如下:

public void InsertRange(int index, IEnumerable<T> collection)
{
    if (collection == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
    }
    if (index > this._size)
    {
        ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_Index);
    }
    ICollection<T> is2 = collection as ICollection<T>;
    if (is2 != null)
    {
        int count = is2.Count;
        if (count > 0)
        {
            this.EnsureCapacity(this._size + count);
            if (index < this._size)
            {
                Array.Copy(this._items, index, this._items, index + count, this._size - index);
            }
            if (this == is2)
            {
                Array.Copy(this._items, 0, this._items, index, index);
                Array.Copy(this._items, (int) (index + count), this._items, (int) (index * 2), (int) (this._size - index));
            }
            else
            {
                T[] array = new T[count];          // (*)
                is2.CopyTo(array, 0);              // (*)
                array.CopyTo(this._items, index);  // (*)
            }
            this._size += count;
        }
    }
    else
    {
        using (IEnumerator<T> enumerator = collection.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                this.Insert(index++, enumerator.Current);
            }
        }
    }
    this._version++;
}

private T[] _items;

One can argue that the simplicity of the interface (only having one overload of InsertRange) justifies the performance overhead of runtime type cheching and casting. 可以说,接口的简单性(只有一个InsertRange的重载)证明了运行时类型切换和转换的性能开销。 But what could be the reason behind the 3 lines I have indicated with (*) ? 但是我用(*)指出的3行背后的原因是什么? I think it could be rewritten to the faster alternative: 我认为它可以改写为更快的替代方案:

is2.CopyTo(this._items, index);

Do you see any reason for not using this simpler and apparently faster alternative? 你认为没有理由不使用这种更简单,更明显更快的替代方案吗?

Edit: 编辑:

Thanks for the answers. 谢谢你的回答。 So consensus opinion is that this is a protective measure against the input collection implementing the CopyTo in a defective/malicious manner. 因此,一致意见认为,这是针对以缺陷/恶意方式实施CopyTo的输入集合的保护措施。 To me it seems like a overkill to constantly pay the price of 1) runtime type checking 2) dynamic allocation of the temporary array 3) double the copy operation, when all this could have been saved by defining 2 or a few more overloads of InsertRange, one getting IEnumerable as now, the second getting a List<T> , third getting T[] . 对我来说,不断付出代价1)运行时类型检查2)临时数组的动态分配3)复制操作的两倍,当所有这些都可以通过定义2或更多的InsertRange重载来保存时,一个获得IEnumerable ,现在,第二个获得List<T> ,第三个获得T[] The later two could have been implemented to run around twice as fast as in the current case. 后两个可能已经实现,运行速度是当前情况的两倍。

Edit 2: 编辑2:

I did implement a class FastList, identical to List, except that it also provides an overload of AddRange which takes a T[] argument. 我确实实现了一个与List相同的类FastList,除了它还提供了一个带有T []参数的AddRange的重载。 This overload does not need the dynamic type verification, and double-copying of elements. 这种重载不需要动态类型验证和元素的双重复制。 I did profile this FastList.AddRange against List.AddRange by adding 4-byte arrays 1000 times to a list which was initially emtpy. 我通过向最初为emtpy的列表添加1000次4字节数组,对List.AddRange进行了FastList.AddRange的分析。 My implementation beats the speed of standard List.AddRange with a factor of 9 (nine!). 我的实现比标准List.AddRange的速度快9倍(9!)。 List.AddRange takes about 5% of runtime in one of the important usage scenarios of our application, replacing List with a class providing a faster AddRange could improve application runtime by 4%. 在我们的应用程序的一个重要使用场景中,List.AddRange占用运行时的大约5%,使用提供更快的AddRange的类替换List可以将应用程序运行时间提高4%。

They are preventing the implementation of ICollection<T> from accessing indices of the destination list outside the bounds of insertion. 它们阻止ICollection<T>的实现访问插入边界之外的目标列表的索引。 The implementation above results in an IndexOutOfBoundsException if a faulty (or "manipulative") implementation of CopyTo is called. 如果调用CopyTo的错误(或“操纵”)实现,则上面的实现会导致IndexOutOfBoundsException

Keep in mind that T[].CopyTo is quite literally internally implemented as memcpy , so the performance overhead of adding that line is minute. 请记住, T[].CopyTo在内部实现为memcpy ,因此添加该行的性能开销很小。 When you have such a low cost of adding safety to a tremendous number of calls, you might as well do so. 当您为大量呼叫增加安全性的成本很低时,您也可以这样做。

Edit: The part I find strange is the fact that the call to ICollection<T>.CopyTo (copying to the temporary array) does not occur immediately following the call to EnsureCapacity . 编辑:我发现奇怪的部分是,给该呼叫的事实ICollection<T>.CopyTo (复制到临时数组)不立即调用以下发生EnsureCapacity If it were moved to that location, then following any synchronous exception the list would remain unchanged. 如果它被移动到该位置,则在任何同步异常之后 ,列表将保持不变。 As-is, that condition only holds if the insertion happens at the end of the list. 原样,只有当插入发生在列表的末尾时,该条件才成立。 The reasoning here is: 这里的推理是:

  • All necessary allocation happens before altering the list elements. 在更改列表元素之前,所有必要的分配都会发生。
  • The calls to Array.Copy cannot fail because Array.Copy的调用不会失败,因为
    • The memory is already allocated 内存已经分配
    • The bounds are already checked 已经检查了边界
    • The element types of the source and destination arrays match 源和目标数组的元素类型匹配
    • There is no "copy constructor" used like in C++ - it's just a memcpy 没有像C ++那样使用“复制构造函数” - 它只是一个memcpy
  • The only items that can throw an exception are the external call to ICollection.CopyTo and the allocations required for resizing the list and allocating the temporary array. 可以抛出异常的唯一项是对ICollection.CopyTo的外部调用以及调整列表大小和分配临时数组所需的分配。 If all three of these occur before moving elements for the insertion, the transaction to change the list cannot throw a synchronous exception. 如果在移动元素以进行插入之前发生了所有这三个,则更改列表的事务不会抛出同步异常。
  • Final note: This address strictly exceptional behavior - the above rationale does not add thread-safety. 最后说明:此地址严格例外行为 - 上述原理并未增加线程安全性。

Edit 2 (response to the OP's edit): Have you profiled this? 编辑2(对OP编辑的回应):你有没有对此进行分析? You are making some bold claims that Microsoft should have chosen a more complicated API, so you should make sure you're correct in the assertions that the current method is slow. 你正在大胆宣称微软应该选择一个更复杂的API,所以你应该确保你在当前方法很慢的断言中是正确的。 I've never had a problem with the performance of InsertRange , and I'm quite sure that any performance problems someone does face with it will be better resolved with an algorithm redesign than by reimplementing the dynamic list. 我从来没有遇到过InsertRange的性能问题,而且我很确定任何有人遇到的性能问题都会通过重新设计算法而不是重新实现动态列表来解决。 Just so you don't take me as being harsh in a negative way, keep the following in mind: 所以你不要以负面的方式对我采取严厉的态度,请记住以下几点:

  • I don't want can't stand people on my dev team that like to reinvent the square wheel . 不想让 我的开发团队的人们不喜欢重新发明方形轮
  • I definitely want people on my team that care about potential performance issues, and ask questions about the side effects their code may have. 绝对希望我的团队中有关心潜在性能问题的人,并询问他们的代码可能产生的副作用。 This point wins out when present - but as long as people are asking questions I will drive them to turn their questions into solid answers. 这一点在出现时胜出 - 但只要人们提出问题,我就会驱使他们将问题转化为可靠的答案。 If you can show me that an application gains a significant advantage through what initially appears to be a bad idea, then that's just the way things go sometimes. 如果你可以告诉我一个应用程序通过最初似乎是一个坏主意获得了显着的优势,那么这就是事情的发展方向。

It's a good question, I'm struggling to come up with a reason. 这是一个很好的问题,我很难想出一个理由。 There's no hint in the Reference Source. 参考源中没有任何提示。 One possibility is that they try to avoid a problem when the class that implements the ICollection<>.CopyTo() method objects against copying to a start index other than 0. Or as a security measure, preventing the collection from messing with the array elements it should not have access to. 一种可能性是,当实现ICollection <>。CopyTo()方法的类反对复制到0以外的起始索引时,它们会尝试避免问题。或者作为安全措施,防止集合搞乱数组元素它应该无法访问。

Another one is that this is a counter-measure when the collection is used in thread-unsafe manner. 另一个是当集合以线程不安全的方式使用时,这是一个反措施。 If an item got added to the collection by another thread it will be the collection class' CopyTo() method that fails, not the Microsoft code. 如果一个项目被另一个线程添加到集合中,那么集合类'CopyTo()方法将失败,而不是Microsoft代码。 The right person will get the service call. 合适的人将接到服务电话。

These are not great explanations. 这些都不是很好的解释。

There is a problem with your solution if you think about it for a minute, if you change the code in that way you are essentially giving the collection that should be added access to an internal datastructure. 如果您考虑一分钟,那么您的解决方案就会出现问题,如果以这种方式更改代码,那么您实际上是应该添加应该添加对内部数据结构的访问权限的集合。

This is not a good idea, for example if the author of the List datastructure figures out a better underlying structure to store the data than an array there is no way to change the implementation of List since all collection are expecting an array into the CopyTo function. 这不是一个好主意,例如,如果List数据结构的作者计算出比数组更好的存储数据的底层结构,则无法更改List的实现,因为所有集合都期望数组进入CopyTo函数。

In essence you would be cementing the implementation of the List class, even though object oriented programming tells us that the internal implementation of a datastructure should be something that can be changed without breaking other code. 本质上,您将巩固List类的实现,即使面向对象编程告诉我们数据结构的内部实现应该是可以在不破坏其他代码的情况下进行更改的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM