简体   繁体   中英

Performance of newing a List<T> with very large initial capacity and filling it versus simply filling a LinkedList<T>

Which is faster and why:

IEnumerable<T> clxnOfTs = GetSeriouslyHugeCollection();
var list = new List<T>(clxnOfTs.Count);
foreach (T t in clxnOfTs) list.Add(t);

or

IEnumerable<T> clxnOfTs = GetSeriouslyHugeCollection();
var linkedList = new LinkedList<T>();
foreach (T t in clxnOfTs) linkedList.Add(t);

Assume this will run on a newish multi-core server with loads of memory.

So really, the question is whether pre-allocating the array that underpins the List all at once and then filling it is faster than simply allocating each LinkedListNode as each T is added to the LinkedList.

My intuition says that allocating a very large chunk of contiguous memory all at once is more expensive than allocating many little chunks anywhere on the heap because it is unlikely that the chunk of contiguous memory will already exist.

Thanks!
Jeff

Allocating many small bits of memory is much more expensive than allocating one large chunk, acquiring the global heap lock isn't that cheap. A List<> runs rings around a LinkedList<>, cpu cache locality is king.

So, as with any performance related question, if you really care about the answer you should create a reaslistic test harness, write the code both ways, and profile it.

But to address your question in a more general sense, I would offer some advice about the scenarios where different kinds of list structures make sense.

List<T> makes sense (from a performace standpoint) when you generally add/remove items at the end of the list, and only rarely add or remove items in the middle. It also works best when you have some expectation about the list's capacity ahead of time. Since List<T> internally allocates memory contiguously, it behaves better from a cache locality standpoint. Since List<T> uses an array as its backing structure, it is also very efficient for random (indexed) access.

LinkedList<T> works better with problems where you need to insert or remove items from the middle or front of the list often. Since it doesn't have to re-allocate or shift the contents of the list to do this, it will perform much better. Since LinkedList<T> used a linked node structure, it does not provide efficient random (indexed) access to the data. As a result, it will perform poorly if you attempt to use LINQ operators like ElementAt() . Linked lists generally perform worse from a cache locality standpoint since they are usually implemented to allocate nodes on demand. Some implementation use pre-cached and recycled nodes which are allocated in pools to minimize this problem - however, I don't believe the .NET implementation does so.

So, I "exhaustively" tested allocating and filling up various containers with various values, with and without specifying capacity, and the answers given here are basically correct. (Kudos to them for venturing hypotheses!)

Constructing a List with initial capacity is actually relatively fast; around a millisecond for as many as 100 million objects containing doubles. Constructing a List without initial capacity or constructing a LinkedList were basically instantaneous, as would be expected.

However, filling the containers revealed very significant performance differences:

Most importantly, filling the LinkedList was dooog slow. It didn't finish adding 100 million objects (simply a double wrapped in an object) in under a minute.

Filling a List constructed WITHOUT an initial capacity with 100 million objects took my test machine an average of 3732 ms. Fast.

Filling a List constructed WITH a specified initial capacity with 100 million objects took my test machine an average of 2295 ms. Very fast.

I concur with those who say the speed is due to the efficiency of manipulating contiguous memory in a cache line.

You have to measure for your particular case. There is no way around it. I would expect one large allocation to be faster if it is small enough :).

CLR memory allocator is designed to handle huge number of small allocations well, but allocating one large block is likley faster (cost of the allocation mostly does not depend on size in CLR).

Now if your size goes above ~80K than object will be allocated on Large Objects Heap (LOH) which is slower compared to allocation from general heap and it has its own consequences (like collection will not happen in Gen 0 for such chunks).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM