简体   繁体   English

C#QuickSort太慢了

[英]C# QuickSort too slow

I'm learning different types of sorting now, and I found out that, starting from a certain point, my QuickSort algorithm doesn't work that quick at all. 我现在正在学习不同类型的排序,我发现,从某一点开始,我的QuickSort算法根本不能快速运行。

Here is my code: 这是我的代码:

class QuickSort
    {

       // partitioning array on the key so that the left part is <=key, right part > key
            private int Partition(int[] arr, int start, int end)
            {
                    int key = arr[end];
                    int i = start - 1;
                    for (int j = start; j < end; j++)
                    {
                            if (arr[j] <= key) Swap(ref arr[++i], ref arr[j]);
                    }
                    Swap(ref arr[++i], ref arr[end]);
                    return i;
            }


            // sorting
            public void QuickSorting(int[] arr, int start, int end)
            {
                    if (start < end)
                    {
                            int key = Partition(arr, start, end);
                            QuickSorting(arr, start, key - 1);
                            QuickSorting(arr, key + 1, end);
                    }
            }
      }


    class Test
    {
            static void Main(string[] args)
            {                       
                    QuickSort quick = new QuickSort();
                    Random rnd = new Random(DateTime.Now.Millisecond);

                    int[] array = new int[1000000];

                    for (int i = 0; i < 1000000; i++)
                    {
                            int i_rnd = rnd.Next(1, 1000);
                            array[i] = i_rnd;
                    }

                    quick.QuickSorting(array, 0, array.Length - 1);

            }
      }

It takes about 15 seconds to run this code on an array of a million elements. 在一百万个元素的数组上运行此代码大约需要15秒。 While, for example, MergeSort or HeapSort do the same in less than a second. 例如,MergeSort或HeapSort在不到一秒的时间内完成相同的操作。

Could you please explain to me why this can happen? 你能告诉我为什么会这样吗?

How quick your sort is and which algorithm you should use depends a lot of your input data. 您的排序速度有多快以及您应该使用哪种算法取决于您输入的大量数据。 Is it random, nearly sorted, reversed etc. 它是随机的,几乎排序的,反转的等等。

There's a very nice page that illustrates how the different sorting algorithms work: 有一个非常好的页面,说明了不同的排序算法如何工作:

Have you considered inlining the Swap method? 您是否考虑过插入Swap方法? It shouldn't be hard to do so, but it may be that the JIT is finding it hard to inline. 这应该不难,但可能是JIT发现难以内联。

When I implemented quicksort for Edulinq I didn't see this problem at all - you may want to try my code (the simplest, recursive form probably) to see how that performs for you. 当我为Edulinq实现快速排序时,我根本没有看到这个问题 - 你可能想尝试我的代码(可能是最简单的递归形式),看看它是如何为你执行的。 If it does well, try to work out where the differences are. 如果它做得好,试着弄清楚差异在哪里。

While different algorithms will behave differently with the same data, I wouldn't expect to see this much difference on randomly-generated data. 虽然不同的算法在相同的数据下表现不同,但我不希望在随机生成的数据上看到这么大的差异。

You have 1,000,000 random elements with 1,000 distinct values. 您有1,000,000个随机元素,其中包含1,000个不同的值。 So, we can expect most values to appear about 1,000 times in your array. 因此,我们可以预期大多数值在您的数组中出现约1,000次。 This gives you some quadratic O(n^2) running time. 这为您提供了一些二次O(n ^ 2)运行时间。

To partition the array in 1,000 pieces, where every partition contains the same number, happens at a stack depth of about log2(1000), about 10. (That is assuming a call to partition neatly breaks it up in two pieces.) That's about 10,000,000 operations. 要将数组分成1,000个,其中每个分区包含相同的数字,发生在大约log2(1000)的堆栈深度,大约为10.(假设调用分区整齐地将其分成两部分。)这是关于10,000,000次操作。

To quicksort the last 1,000 partitions, all containing 1,000 identical values. 要快速排序最后1,000个分区,所有分区都包含1,000个相同的值。 We need 1,000 times 1,000 + 999 + 998 + ... + 1 comparisons. 我们需要1000次1,000 + 999 + 998 + ... + 1次比较。 (At every round quicksort reduces the problem by one, only removing the key/pivot.) That gives 500,000,000 operations. (在每一轮快速排序中将问题减少一个,只删除键/枢轴。)这样可以提供500,000,000个操作。 The most ideal way of quicksort 1,000 partitions would be 1,000 times 1,000*10 operations = 10,000,000. 快速排序1,000个分区的最理想方式是1000次1,000 * 10次操作= 10,000,000次。 Because of the identical values, you hit a quadratic case here, quicksort's worst case performance. 由于相同的值,你在这里遇到了二次方案,即quicksort的最差情况。 So, about halfway down the quicksort, it goes to worst case behavior. 因此,在快速排序的一半左右,它会出现最糟糕的情况。

If every value occurs only a few times, it doesn't matter if you sort those few tiny partitions in O(N^2) or O(N logN) . 如果每个值只出现几次,那么在O(N^2)O(N logN)对那些微小分区进行排序并不重要。 But here we had a lot and huge partitions to be sorted in O(N^2) . 但是在这里我们有很多和巨大的分区要在O(N^2)排序。


To improve your code: partition in 3 pieces. 改进你的代码:3件分区。 Smaller than the pivot, equal to the pivot and bigger than the pivot. 比枢轴小,等于枢轴并且比枢轴大。 Then, only quicksort the first and last partitions. 然后,只快速排序第一个和最后一个分区。 You will need to do an extra compare; 你需要做一个额外的比较; test for equality first. 先测试平等。 But I think, for this input, it would be a lot faster. 但我认为,对于这种输入,它会快得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM