简体   繁体   English

字典中的前100个值 <string, int> -为什么LinQ比foreach循环快得多

[英]Top 100 values in a Dictionary<string, int> - Why is LinQ so much faster than a foreach loop

I am writing a simple app to parse a huge textfile(60gb) and store all the words and the amount of time it appears in the file. 我正在编写一个简单的应用程序,以分析巨大的textfile(60gb)并存储所有单词以及它出现在文件中的时间。 For testing sake I cut the file down to 2gb. 为了测试起见,我将文件切成2gb。

I have the words and the counts in a Dictionary though I'm finding it hard to believe the results I'm seeing. 我发现字典中的单词和计数虽然很难相信我所看到的结果。

Total words in the dictionary: 1128495 词典中的单词总数:1128495

Code I'm using: 我正在使用的代码:

sw.Start();

StringBuilder sb = new StringBuilder();
sb.AppendFormat("<html><head></head><body>");
lock (Container.values)
{
    int i = int.Parse(ctx.Request.QueryString["type"]);
    switch (i)
    {
        case 1: //LinQ
            var values = Container.values.OrderByDescending(a => a.Value.Count).Take(100);
            foreach (var value in values)
            {
                sb.AppendFormat("{0} - {1}<br />", value.Key, value.Value.Count);
            }
            break;
        case 2: //Foreach
            foreach (var y in Container.values)
            {

            }
            break;
        case 3: //For
            for (int x = 0; x < Container.values.Count; x++)
            {

            }
            break;
    }                
}
sw.Stop();
sb.AppendFormat("<br /><br /> {0}", sw.ElapsedMilliseconds);
sb.AppendFormat("</body>");

Ran it twice, speeds below are in milliseconds: 跑两次,下面的速度以毫秒为单位:

LinQ: #1: 598, #2 609 LinQ:#1:598,#2 609

Foreach: #1 1000, # 1020 Foreach:#1 1000,#1020

Why is LinQ faster than a foreach? 为什么LinQ比foreach更快? I assume LinQ has to loop through the Dictionary itself so how does it go about that + sorting it all in such a timely manner? 我认为LinQ必须循环遍历Dictionary本身,因此它如何进行+如此及时地将其排序?

Edit: After compiling to Release mode the results are as follows: LinQ: 796(slower?) foreach: 945 编辑:编译为发布模式后,结果如下:LinQ:796(较慢?)foreach:945

The app is a simple console app, the code is executing in a HttpListener 该应用程序是一个简单的控制台应用程序,代码在HttpListener中执行

Edit 2: I have managed to figure out what the issue was. 编辑2:我设法弄清楚了问题所在。 When I initialized the dictionary I set its capacity to be 89000000(when processing the 60gb file it would throw an OutOfMemory exception otherwise). 当我初始化字典时,我将其容量设置为89000000(在处理60gb文件时,否则将抛出OutOfMemory异常)。 For some reason this drastically slows the performance of the foreach loop. 由于某种原因,这会大大降低foreach循环的性能。 If I set the capacity to 1128495 the foreach loop executes in 56 milliseconds. 如果将容量设置为1128495,则foreach循环将在56毫秒内执行。

Why is this happening? 为什么会这样呢? If I put a counter in the loop it only runs 1128495 times even with a capacity of 89000000. 如果将计数器放在循环中,即使容量为89000000,它也只能运行1128495次。

A foreach loop is implemented by the compiler by calling GetEnumerator() and then calling MoveNext and Current repeatedly on the enumerator. 编译器通过调用GetEnumerator(),然后在枚举器上重复调用MoveNext和Current,来实现foreach循环。 LINQ's OrderByDescending normally works exactly the same way, it basically does a foreach to extract all the elements and then it sorts them. LINQ的OrderByDescending通常以完全相同的方式工作,它基本上进行了foreach提取所有元素,然后对其进行排序。

A quick look in ILSpy shows that OrderByDescending puts the container in an internal type called Buffer<T> , which has an optimization: in case the container implements ICollection<T> , it uses ICollection<T>.CopyTo instead of a foreach loop. 快速浏览ILSpy可以发现,OrderByDescending将容器放入名为Buffer<T>的内部类型中,该类型具有优化:如果容器实现ICollection<T> ,则它使用ICollection<T>.CopyTo而不是foreach循环。 Usually OrderByDescending would still not be faster than a foreach loop, because after extracting the elements it has to sort them. 通常,OrderByDescending仍不会比foreach循环快,因为提取元素后必须对其进行排序。

Are you leaving out the code in your foreach loop, code that might explain why it's slower? 您是否在foreach循环中遗漏了可能解释其速度较慢的代码? If you really are using an empty foreach loop, perhaps the explanation is that the IEnumerator<T> type (or GetEnumerator method) of Container.values is slow compared to its CopyTo method. 如果您确实使用了空的foreach循环,则可能是由于Container.valuesIEnumerator<T>类型(或GetEnumerator方法)与其CopyTo方法相比速度较慢。

Your LINQ version only takes the first 100 elements! 您的LINQ版本仅需要前100个元素!

Remove the .Take(100) in order to compare! 删除.Take(100)以便进行比较!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM