简体   繁体   中英

How does IEnumerable<T>.ToArray() work?

Is it a two-pass algorithm? ie, it iterates the enumerable once to count the number of elements so that it can allocate the array, and then pass again to insert them?

Does it loop once, and keep resizing the array?

Or does it use an intermediate structure like a List (which probably internally resizes an array)?

It uses an intermediate structure. The actual type involved is a Buffer, which is an internal struct in the framework. In practice, this type has an array, that is copied each time it is full to allocate more space. This array starts with length of 4 (in .NET 4, it's an implementation detail that might change), so you might end up allocating and copying a lot when doing ToArray.

There is an optimization in place, though. If the source implementes ICollection<T> , it uses Count from that to allocate the correct size of array from the start.

First it checks to see if the source is an ICollection<T> , in which case it can call the source's ToArray() method.

Otherwise, it enumerates the source exactly once. As it enumerates it stores items into a buffer array. Whenever it hits the end of the buffer array it creates a new buffer of twice the size and copies in the old elements. Once the enumeration is finished it returns the buffer (if it's the exact right size) or copies the items from the buffer into an array of the exact right size.

Here's pseudo-source code for the operation:

public static T[] ToArray<T>(this IEnumerable<T> source)
{
    T[] items = null;
    int count = 0;

    foreach (T item in source)
    {
        if (items == null)
        {
            items = new T[4];
        }
        else if (items.Length == count)
        {
            T[] destinationArray = new T[count * 2];
            Array.Copy(items, 0, destinationArray, 0, count);
            items = destinationArray;
        }
        items[count] = item;
        count++;
    }

    if (items.Length == count)
    {
        return items;
    }
    T[] destinationArray = new TElement[count];
    Array.Copy(items, 0, destinationArray, 0, count);
    return destinationArray;
}

Like this (via .NET Reflector):

public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    Buffer<TSource> buffer = new Buffer<TSource>(source);
    return buffer.ToArray();
}

[StructLayout(LayoutKind.Sequential)]
internal struct Buffer<TElement>
{
    internal TElement[] items;
    internal int count;
    internal Buffer(IEnumerable<TElement> source)
    {
        TElement[] array = null;
        int length = 0;
        ICollection<TElement> is2 = source as ICollection<TElement>;
        if (is2 != null)
        {
            length = is2.Count;
            if (length > 0)
            {
                array = new TElement[length];
                is2.CopyTo(array, 0);
            }
        }
        else
        {
            foreach (TElement local in source)
            {
                if (array == null)
                {
                    array = new TElement[4];
                }
                else if (array.Length == length)
                {
                    TElement[] destinationArray = new TElement[length * 2];
                    Array.Copy(array, 0, destinationArray, 0, length);
                    array = destinationArray;
                }
                array[length] = local;
                length++;
            }
        }
        this.items = array;
        this.count = length;
    }

    internal TElement[] ToArray()
    {
        if (this.count == 0)
        {
            return new TElement[0];
        }
        if (this.items.Length == this.count)
        {
            return this.items;
        }
        TElement[] destinationArray = new TElement[this.count];
        Array.Copy(this.items, 0, destinationArray, 0, this.count);
        return destinationArray;
    }
}

First, the items are loaded into an internal class Buffer<T> which allows the count to be generated

Next, Buffer<T>.ToArray is called, which does an Array.Copy of the Buffer<T> 's array into a returned array.

.NET Reflector shows this code if you want to see for yourself.

http://www.red-gate.com/products/reflector/

In general, attempting to iterate an enumerable twice can lead to a disaster as there is no guarantee that the enumerable can be iterated a second time. Therefore, performing a Count and then allocate then copy is out.

In Reflector, it shows that it uses a type called Buffer that effectively streams the sequence into an array resizing (doubling on each reallocation so that the number of reallocations is O(log n) ) as needed and then returning an appropriately sized array when it reaches the end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM