简体   繁体   English

List的内部数组如何? <T> 使用AddRange()时增加

[英]How is the internal array of a List<T> increased when using AddRange()

I am looping through a potentially huge (millions of items) dataset (stored on disk) and pulling out selected items which I am adding to a List<T> . 我正在循环一个潜在的巨大(数百万项)数据集(存储在磁盘上)并拉出我添加到List<T>所选项目。 When I add an item to the list, I put a lock around it, as there are other threads accessing the list. 当我将一个项目添加到列表中时,我会锁定它,因为有其他线程访问列表。

I am trying to decide between two possible implementations: 我试图在两种可能的实现之间做出决定:

1) Lock the list every time I need to add an item. 1)每次我需要添加项目时锁定列表。

2) Use a temporary list that I add items to as I find them, and then use List<T>.AddRange() to add the items in that list in a chunk (eg when I have found 1000 matches). 2)使用我添加项目的临时列表,然后使用List<T>.AddRange()将该列表中的项目添加到块中(例如,当我找到1000个匹配项时)。 This results in needing to request a lock on the list less often, but if AddRange() only increases the capacity enough to exactly accommodate the new items then the list will end up being re-sized a lot more times. 这导致需要不经常请求锁定列表,但是如果AddRange()仅增加容量足以完全容纳新项目,那么列表将最终重新调整大小很多次。

My question is this: As I understand it, adding items one at a time will cause the internal capacity of a List<T> to double in size every time the capacity is reached, but I don't know how List<T>.AddRange() behaves. 我的问题是:据我了解,每次添加一个项目将导致每次达到容量时List<T>的内部容量增加一倍,但我不知道List<T>.AddRange()表现得List<T>.AddRange() I would assume that it only adds enough capacity to accommodate the new items, but I can't find any way to confirm this. 我认为它只增加了容纳新物品的容量,但我找不到任何方法来证实这一点。 The description of how the capacity is increased on MSDN is almost identical for Add() and AddRange(), except that for AddRange it says that if the new count is greater than the capacity the capacity is increased rather than if the Count is already the same as the capacity. 关于如何在MSDN上增加容量的描述对于Add()和AddRange()几乎是相同的,除了对于AddRange它说如果新计数大于容量,则容量增加而不是如果Count已经是与容量相同。
To me this reads as if using AddRange() to add enough items to go over the current capacity would cause the capacity to be increased in the same way that going over the current capacity using Add() would. 对我来说,这就好像使用AddRange()来添加足够的项目以超过当前容量将导致容量增加,就像使用Add()将超过当前容量一样。

So, will adding items using List<T>.AddRange() in a chunk large enough to exceed the current capacity cause the capacity to increase only enough to accommodate the new items, or will it cause the capacity to double? 那么,使用List<T>.AddRange()在一个足以超过当前容量的块中添加项目会导致容量增加到足以容纳新项目,还是会导致容量加倍? Or does it do something else that I've not even considered? 或者它是否做了我甚至没有考虑过的其他事情?

Hopefully this is clear enough without any code samples as it is a general question about how List<T> is implemented, but if not I will add any that will make my question clearer. 希望这很清楚,没有任何代码示例,因为它是关于如何实现List<T>的一般性问题,但如果不是,我将添加任何将使我的问题更清楚的问题。 As mentioned, I've read the MSDN documentation and couldn't find a clear answer. 如上所述,我已经阅读了MSDN文档,但找不到明确的答案。 I searched for any similar questions on here as well and couldn't find any, but if there's one I've missed please point me to it! 我在这里也搜索过任何类似的问题但找不到任何问题,但是如果有一个我错过的请指点我!

As long as the collection passed as AddRange parameter implements ICollection<T> the array size is incremented just once: 只要作为AddRange参数传递的集合实现ICollection<T> ,数组大小只增加一次:

ICollection<T> collection2 = collection as ICollection<T>;
if (collection2 != null)
{
    int count = collection2.Count;
    if (count > 0)
    {
        this.EnsureCapacity(this._size + count);

    // (...)

otherwise standard enumeration and Insert method call for each element is done: 否则完成每个元素的标准枚举和Insert方法调用:

}
else
{
    using (IEnumerator<T> enumerator = collection.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            this.Insert(index++, enumerator.Current);
        }
    }
}

Edit 编辑

Look into EnsureCapacity method: 查看EnsureCapacity方法:

private void EnsureCapacity(int min)
{
    if (this._items.Length < min)
    {
        int num = (this._items.Length == 0) ? 4 : (this._items.Length * 2);
        if (num > 2146435071)
        {
            num = 2146435071;
        }
        if (num < min)
        {
            num = min;
        }
        this.Capacity = num;
    }
}

It increases the array size by Max(old_size * 2, min) , and because it's being called with min = old_size + count the final array size after AddRange call will be set to Max(old_size * 2, old_size + count) - it will wary on current List<T> size and size of collection that is added using AddRange method. 它通过Max(old_size * 2, min)增加数组大小,并且因为它使用min = old_size + count调用, AddRange调用后的最终数组大小将设置为Max(old_size * 2, old_size + count) - 它将警惕当前List<T>使用AddRange方法添加的集合的大小和大小。

The capacity is increased in the same way as with Add . 容量增加的方式与Add相同。 This is not explicitly mentioned in the documentation, but a look at the source code shows that both Add and AddRange internally use EnsureCapacity . 文档中没有明确提到这一点,但是查看源代码显示AddAddRange内部都使用EnsureCapacity

AddRange will only increase the size only to the necessary amount. AddRange只会将大小增加到必要的数量。 So in the AddRange function you could find something like the following code: 因此,在AddRange函数中,您可以找到类似以下代码的内容:

 
 
 
  
  if(capacity < count + items.Count) { capacity = count + items.Count; }
 
  

Edit: Turns out the items might be added one by one. 编辑:结果可能会逐个添加项目。

But if you're working with really large data sets and read performance is important, it's probably better to use a binary tree. 但是,如果您正在使用非常大的数据集并且读取性能很重要,那么使用二叉树可能会更好。 That will allow faster search, adding, removing and partial locking, leaving the rest of the tree usable. 这将允许更快的搜索,添加,删除和部分锁定,使树的其余部分可用。 The biggest problem with tree's is when to rebalance. 树的最大问题是何时重新平衡。 I used this tree in my chess game, which is rebalanced after every move (because that's when removals are needed and thats not thread-safe with this implementation): 我在我的国际象棋游戏中使用了这个树,它在每次移动后都会被重新平衡(因为这是需要删除的时候,这对于这个实现来说不是线程安全的):

namespace Chess
{
    /// <summary>
    /// Implements using a binary search tree.
    /// Is thread-safe when adding, not when removing.
    /// </summary>
    public class BinaryTree
    {
        public MiniMax.Node info;
        public BinaryTree left, right;

        /// <summary>
        /// Collisions are handled by returning the existing node. Thread-safe
        /// Does not recalculate height, do that after all positions are added.
        /// </summary>
        /// <param name="info">Connector in a tree structure</param>
        /// <returns>Node the position was already store in, null if new node.</returns>
        public MiniMax.Node AddConnection(MiniMax.Node chessNode)
        {
            if (this.info == null)
            {
                lock (this)
                {
                    // Must check again, in case it was changed before lock.
                    if (this.info == null)
                    {
                        this.left = new BinaryTree();
                        this.right = new BinaryTree();
                        this.info = chessNode;
                        return null;
                    }
                }
            }

            int difference = this.info.position.CompareTo(chessNode.position);

            if (difference < 0) return this.left.AddConnection(chessNode);
            else if (difference > 0) return this.right.AddConnection(chessNode);
            else
            {
                this.info.IncreaseReferenceCount();
                return this.info;
            }
        }

        /// <summary>
        /// Construct a new Binary search tree from an array.
        /// </summary>
        /// <param name="elements">Array of elements, inorder.</param>
        /// <param name="min">First element of this branch.</param>
        /// <param name="max">Last element of this branch.</param>
        public void CreateFromArray(MiniMax.Node[] elements, int min, int max)
        {
            if (max >= min)
            {
                int mid = (min + max) >> 1;
                this.info = elements[mid];

                this.left = new BinaryTree();
                this.right = new BinaryTree();

                // The left and right each have one half of the array, exept the mid.
                this.left.CreateFromArray(elements, min, mid - 1);
                this.right.CreateFromArray(elements, mid + 1, max);
            }
        }

        public void CollectUnmarked(MiniMax.Node[] restructure, ref int index)
        {
            if (this.info != null)
            {
                this.left.CollectUnmarked(restructure, ref index);

                // Nodes marked for removal will not be added to the array.
                if (!this.info.Marked)
                    restructure[index++] = this.info;

                this.right.CollectUnmarked(restructure, ref index);
            }
        }

        public int Unmark()
        {
            if (this.info != null)
            {
                this.info.Marked = false;
                return this.left.Unmark() + this.right.Unmark() + 1;
            }
            else return 0;
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM