简体   繁体   English

VS2010与VS2008中的字符串排序性能下降

[英]String sorting performance degradation in VS2010 vs. VS2008

The following C# code seems to run slower when built with VS2010 than with VS2008: on a Core i5 Win7 x64 8 GB RAM PC, the VS2008 built version sorts strings in about 7.5 seconds, instead the VS2010 built version requires about 9 seconds. 使用VS2010构建时,以下C#代码似乎比使用VS2008运行速度慢 :在Core i5 Win7 x64 8 GB RAM PC上,VS2008内置版本在大约7.5秒内对字符串进行排序,而VS2010内置版本需要大约9秒。 Why is that? 这是为什么?

Is there anything wrong with my code? 我的代码有什么问题吗?

Did the sorting algorithm change in VS2010? 排序算法是否在VS2010中发生了变化?

Is there anything different in the underlying CLR that makes the performance worse? 底层CLR中有什么不同会使性能变差吗?

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Globalization;
using System.Linq;

namespace StringSortCSharp
{
    /// <summary>
    /// Console app to test string sorting performance in C#.
    /// </summary>
    class Program
    {
        /// <summary>
        /// Displays the first lines from a vector of strings.
        /// </summary>
        /// <param name="wishedN">Number of lines to display.</param>
        /// <param name="lines">Source lines to display.</param>
        private static void DisplayFirst(int wishedN, List<string> lines)
        {
            int n = Math.Min(wishedN, lines.Count);
            for (int i = 0; i < n; i++)
            {
                Console.WriteLine("  " + lines[i]);
            }
            Console.WriteLine();
        }

        /// <summary>
        /// Used for random permutation.
        /// </summary>
        private static Random random = new Random();

        /// <summary>
        /// Computes a random permutation of the input sequence.
        /// 
        /// From:
        ///     http://stackoverflow.com/questions/375351/most-efficient-way-to-randomly-sort-shuffle-a-list-of-integers-in-c-sharp
        /// 
        /// </summary>
        /// <typeparam name="T">Type stored in the sequences.</typeparam>
        /// <param name="sequence">Input sequence.</param>
        /// <returns>Random permutation of the input sequence.</returns>
        private static IEnumerable<T> RandomPermutation<T>(IEnumerable<T> sequence)
        {
            T[] retArray = sequence.ToArray();


            for (int i = 0; i < retArray.Length - 1; i += 1)
            {
                int swapIndex = random.Next(i + 1, retArray.Length);
                T temp = retArray[i];
                retArray[i] = retArray[swapIndex];
                retArray[swapIndex] = temp;
            }
            return retArray;
        }


        /// <summary>
        /// Builds a list of strings used in the performance benchmark.
        /// </summary>
        /// <returns>Test list of strings.</returns>
        private static List<string> BuildTestLines()
        {
            // Start with "Lorem ipsum", and repeat it several times, adding some suffix strings.

            var lorem = new string[]
             {
                 "Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                 "Maecenas porttitor congue massa. Fusce posuere, magna sed",
                 "pulvinar ultricies, purus lectus malesuada libero,",
                 "sit amet commodo magna eros quis urna.",
                 "Nunc viverra imperdiet enim. Fusce est. Vivamus a tellus.",
                 "Pellentesque habitant morbi tristique senectus et netus et",
                 "malesuada fames ac turpis egestas. Proin pharetra nonummy pede.",
                 "Mauris et orci."
             };

            int repeatCount = 200 * 1000;

            Console.Write("Building test strings");
            var testLines = new List<string>();

            Console.Write(" (total string count = {0})", repeatCount * lorem.Length);
            Console.Write("...");
            for (int i = 0; i < repeatCount; i++)
            {
                for (int j = 0; j < lorem.Length; j++)
                {
                    // Add more stuff to Lorem strings 
                    testLines.Add(lorem[j] + " (#" + i + ")");
                }
            }
            Console.WriteLine("done.");

            DisplayFirst(5, testLines);
            Console.WriteLine();

            // Shuffle the previously built strings.

            Console.Write("Shuffling strings...");
            var randomLines = new List<string>(RandomPermutation(testLines));
            Console.WriteLine("done.");
            DisplayFirst(5, randomLines);
            Console.WriteLine();

            return randomLines;
        }


        /// <summary>
        /// Sort the input lines.
        /// </summary>
        /// <param name="lines">Input lines to sort.</param>
        private static void Test(List<string> lines)
        {
            // Stopwatch to measure time performance
            var timer = new Stopwatch();

            Console.Write("Sorting " + lines.Count + " lines...");

            // Sort benchmark

            timer.Start();
            lines.Sort();
            timer.Stop();
            Console.WriteLine("done.");

            // Display results

            DisplayFirst(5, lines);

            Console.WriteLine();
            Console.WriteLine((timer.ElapsedMilliseconds / 1000.0).ToString(CultureInfo.InvariantCulture) + " seconds elapsed.");
        }

        static void Main(string[] args)
        {
            Console.WriteLine("*** Testing String Sorting in C# ***");
            Console.WriteLine();

            // Build test lines used for the sort benchmark
            List<string> testLines = BuildTestLines();

            // Run the sort test
            Test(testLines);
        }
    }
}

Here is a brief outline of sorting algorithms used in .NET versions. 以下是.NET版本中使用的排序算法的简要概述。 It's helpful to remember that List<T>.Sort() internally uses Array<T>.Sort() 记住List<T>.Sort()内部使用Array<T>.Sort()

  • In .NET 2.0-4.0, a quick sort algorithm is used to sort an Array . 在.NET 2.0-4.0中,使用快速排序算法对Array进行排序。 There have been minor changes to the code, but for the most part, the code remains the same. 代码有一些细微的变化,但在大多数情况下,代码保持不变。
  • In .NET 4.5, the array sorting algorithm changed from quick sort to an introspective sort. 在.NET 4.5中,数组排序算法从快速排序更改为内省排序。 This is a larger change than from before, one that, at least in my tests, shows considerable performance improvements. 这是一个比以前更大的变化,至少在我的测试中,它表现出相当大的性能改进。

Did the sorting algorithm change in VS2010? 排序算法是否在VS2010中发生了变化?

Yes, but the changes were minor, and doesn't affect performance. 是的,但变化很小,并不影响性能。 Consider a sort against 20 million shuffled integers 1 : 考虑对2000万个洗牌整数进行排序1

List<int>.Sort() (20 million)

.NET 3.5       .NET 4.0       .NET 4.5
---------      ---------      ---------
 2.564s         2.565s         2.337s

There's no change between v3.5 and v4.0 in terms of performance. 在性能方面,v3.5和v4.0之间没有变化。 There is a noticeable increase in speed for v4.5. v4.5的速度明显提高。 It's clear that it's not the actual sorting algorithm that is making the difference. 很明显,实际的排序算法不是产生差异。

Before we jump into your next question, let me share my results of running your actual code on my machine: 在我们跳到您的下一个问题之前,让我分享一下在我的机器上运行您的实际代码的结果:

List<string>.Sort() (1.6 million)

.NET 3.5       .NET 4.0       .NET 4.5
---------      ---------      ---------
 7.953s         11.267s        10.092s

I get similar results, as you do. 我和你一样得到了类似的结果。 These results are a good lead-in to your next question: 这些结果是您下一个问题的良好导入:

Is there anything different in the underlying CLR that makes the performance worse? 底层CLR中有什么不同会使性能变差吗?

Without a doubt. 毫无疑问。 So, what is the difference? 那么区别是什么呢? The difference is in string comparison implementation. 区别在于字符串比较实现。 In each step of the sorting algorithm it needs to compare the two strings, and it happens to do it differently between the v2.0 and v4.0 runtime. 在排序算法的每个步骤中,它需要比较两个字符串,并且它恰好在v2.0和v4.0运行时之间做了不同的操作。 (See extra notes below) (见下面的额外说明)

The easiest way to prove this is to force sorting by ordinal position, instead of culture dependence. 证明这一点的最简单方法是强制按顺序排序,而不是文化依赖。 Replace lines.Sort(); 替换lines.Sort(); with lines.Sort(StringComparer.Ordinal); with lines.Sort(StringComparer.Ordinal); . Here is what I measured: 这是我测量的:

List<string>.Sort(StringComparer.Ordinal) (1.6 million)

.NET 3.5       .NET 4.0       .NET 4.5
---------      ---------      ---------
 4.088s         3.76s          3.454s

Now, that looks better! 现在,这看起来更好! It's more or less what I expected; 这或多或少是我的期望; a steady increase in speed for each version of the framework released. 每个版本的框架都会稳步提高速度。 MSDN Suggests that if you're ever doing a non-linguistic comparison on a string, you should use an ordinal comparison. MSDN建议如果您对字符串进行非语言比较, 则应使用序数比较。

However, that only solves the problem if your comparison or sorting isn't culture-sensitive. 但是,如果您的比较或排序不是文化敏感的,那么这只能解决问题。 If you need culture-sensitive sorting, it seems you won't be able to get rid of the slower execution time unless you want to revert to the .NET 3.5 framework. 如果您需要对文化敏感的排序,除非您想要恢复到.NET 3.5框架,否则您似乎无法摆脱较慢的执行时间。


Extra notes 额外的笔记

When you don't pass a comparer to List<T>.Sort() or Array.Sort , it will use the default comparer. 如果未将比较器传递给List<T>.Sort()Array.Sort ,它将使用默认比较器。 Default comparers for .NET strings uses the comparer from the Thread's current culture. .NET字符串的默认比较器使用Thread当前文化中的比较器。 From there, it calls some internal functions in the .NET runtime native libraries. 从那里,它调用.NET运行时本机库中的一些内部函数。

In v2.0-3.5, it calls COMNlsInfo::Compare and COMNlsInfo::CompareFast . 在v2.0-3.5中,它调用COMNlsInfo::CompareCOMNlsInfo::CompareFast Here's what the call stack (kinda) looks like: 这是调用堆栈(有点)的样子:

String.CompareTo(string)
+--System.Globalization.CompareInfo.Compare(string,string,CompareOptions)
   +--mscorwks.dll!COMNlsInfo::Compare
      +--mscorwks.dll!COMNlsInfo::CompareFast

Similar source for these functions is visible in the shared source implementation of the CLI (SSCLI). 在CLI的共享源实现(SSCLI)中可以看到这些函数的类似源。 It's located in sscli\\clr\\src\\classlibnative\\nls\\comnlsinfo.cpp on lines 1034 and 893, respectively. 它分别位于第1034和893行的sscli\\clr\\src\\classlibnative\\nls\\comnlsinfo.cpp

In v4.0, however, that call tree changed fairly significantly: 但是,在v4.0中,该调用树发生了相当大的变化:

String.CompareTo(string)
+--System.Globalization.CompareInfo.Compare(string,string,CompareOptions)
   +--clr.dll!COMNlsInfo::InternalCompareString
      +--clr.dll!SortVersioning::SortDllCompareString
         +--nlssorting.dll!_SortCompareString
            +--nlssorting.dll!_AsciiCompareString

I wish I could tell you why one is slower than the other, but I have no clue at all and there is no SSCLI for .NET 4.0 to compare against. 我希望我能告诉你为什么一个比另一个慢,但我根本没有任何线索,并且没有用于.NET 4.0的SSCLI进行比较。 The major changes to string handling in .NET 4.0 weren't without problems. .NET 4.0中字符串处理的主要变化并非没有问题。 There have been performance issues related to strings in .NET 4.0 , however they don't really apply here. .NET 4.0中存在与字符串相关的性能问题,但它们并不适用于此处。


1 All tests were run in a virtual machine. 1 所有测试都在虚拟机中运行。 Win 2008R2 x64 w/ 4GB RAM and a virtual quad-core processor. 赢得2008R2 x64 w / 4GB RAM和虚拟四核处理器。 Host machine is Win7 x64 w/ 24GB RAM and a Xeon W3540 (2.93ghz) quad-core (8 Logical processors). 主机是Win7 x64 w / 24GB RAM和Xeon W3540(2.93ghz)四核(8个逻辑处理器)。 Results are an average of 5 runs with the best and worst times removed. 结果是平均5次运行,最佳和最差时间被删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM