简体   繁体   中英

Top 100 values in a Dictionary<string, int> - Why is LinQ so much faster than a foreach loop

I am writing a simple app to parse a huge textfile(60gb) and store all the words and the amount of time it appears in the file. For testing sake I cut the file down to 2gb.

I have the words and the counts in a Dictionary though I'm finding it hard to believe the results I'm seeing.

Total words in the dictionary: 1128495

Code I'm using:

sw.Start();

StringBuilder sb = new StringBuilder();
sb.AppendFormat("<html><head></head><body>");
lock (Container.values)
{
    int i = int.Parse(ctx.Request.QueryString["type"]);
    switch (i)
    {
        case 1: //LinQ
            var values = Container.values.OrderByDescending(a => a.Value.Count).Take(100);
            foreach (var value in values)
            {
                sb.AppendFormat("{0} - {1}<br />", value.Key, value.Value.Count);
            }
            break;
        case 2: //Foreach
            foreach (var y in Container.values)
            {

            }
            break;
        case 3: //For
            for (int x = 0; x < Container.values.Count; x++)
            {

            }
            break;
    }                
}
sw.Stop();
sb.AppendFormat("<br /><br /> {0}", sw.ElapsedMilliseconds);
sb.AppendFormat("</body>");

Ran it twice, speeds below are in milliseconds:

LinQ: #1: 598, #2 609

Foreach: #1 1000, # 1020

Why is LinQ faster than a foreach? I assume LinQ has to loop through the Dictionary itself so how does it go about that + sorting it all in such a timely manner?

Edit: After compiling to Release mode the results are as follows: LinQ: 796(slower?) foreach: 945

The app is a simple console app, the code is executing in a HttpListener

Edit 2: I have managed to figure out what the issue was. When I initialized the dictionary I set its capacity to be 89000000(when processing the 60gb file it would throw an OutOfMemory exception otherwise). For some reason this drastically slows the performance of the foreach loop. If I set the capacity to 1128495 the foreach loop executes in 56 milliseconds.

Why is this happening? If I put a counter in the loop it only runs 1128495 times even with a capacity of 89000000.

A foreach loop is implemented by the compiler by calling GetEnumerator() and then calling MoveNext and Current repeatedly on the enumerator. LINQ's OrderByDescending normally works exactly the same way, it basically does a foreach to extract all the elements and then it sorts them.

A quick look in ILSpy shows that OrderByDescending puts the container in an internal type called Buffer<T> , which has an optimization: in case the container implements ICollection<T> , it uses ICollection<T>.CopyTo instead of a foreach loop. Usually OrderByDescending would still not be faster than a foreach loop, because after extracting the elements it has to sort them.

Are you leaving out the code in your foreach loop, code that might explain why it's slower? If you really are using an empty foreach loop, perhaps the explanation is that the IEnumerator<T> type (or GetEnumerator method) of Container.values is slow compared to its CopyTo method.

Your LINQ version only takes the first 100 elements!

Remove the .Take(100) in order to compare!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM