简体   繁体   English

带插入的Quicksort排序完成-我在哪里出错?

[英]Quicksort with insertion Sort finish - where am I going wrong?

I am working on a project for a class. 我正在为一个班级做一个项目。 We are to write a quick-sort that transitions to a insertion sort at the specified value. 我们将编写一个快速排序,该排序将转换为指定值的插入排序。 Thats no problem, where I am now having difficulty is figuring out why I am not getting the performance I expect. 没问题,我现在遇到的困难是弄清楚为什么我没有得到我期望的性能。

One of the requirements is that it must sort an array of 5,00,000 ints in under 1,300 ms (this is on standard machines, so CPU speed is not an issue). 要求之一是必须在1300毫秒内对5,00,000 int的数组进行排序(这是在标准计算机上进行的,因此CPU速度不是问题)。 First of all, I can't get it to work on 5,000,000 because of a stack overflow error (too many recursive calls...). 首先,由于堆栈溢出错误(太多的递归调用...),我无法在5,000,000上运行它。 If I increase the heap size, I am still getting a lot slower than that. 如果增加堆大小,我仍然会比这慢很多。

Below is the code. 下面是代码。 Any hints anyone? 有任何暗示吗?

Thanks in advance 提前致谢

public class MyQuickSort {

    public static void sort(int [] toSort, int moveToInsertion)
    {
        sort(toSort, 0, toSort.length - 1, moveToInsertion);
    }

    private static void sort(int[] toSort, int first, int last, int moveToInsertion)
    {
        if (first < last)
        {
            if ((last - first) < moveToInsertion)
            {
                insertionSort(toSort, first, last);
            }
            else
            {
                int split = quickHelper(toSort, first, last);
                sort(toSort, first, split - 1, moveToInsertion);
                sort(toSort, split + 1, last, moveToInsertion);
            }
        }
    }

    private static int quickHelper(int[] toSort, int first, int last)
    {
        sortPivot(toSort, first, last);
        swap(toSort, first, first + (last - first)/2);
        int left = first;
        int right = last;
        int pivotVal = toSort[first];
        do
        {
            while ( (left < last) && (toSort[left] <= pivotVal)) 
            {
                left++;
            }

            while (toSort[right] > pivotVal) 
            {
                right--;
            }

            if (left < right) 
            { 
                swap(toSort, left, right); 
            }

        } while (left < right);

        swap(toSort, first, right);


        return right;
    }

    private static void sortPivot(int[] toSort, int first, int last)
    {
        int middle = first + (last - first)/2;

        if (toSort[middle] < toSort[first]) swap(toSort, first, middle);

        if (toSort[last] < toSort[middle]) swap(toSort, middle, last);

        if (toSort[middle] < toSort[first]) swap(toSort, first, middle);

    }

    private static void insertionSort(int [] toSort, int first, int last)
    {
         for (int nextVal = first + 1; nextVal <= last; nextVal++)
            {
                int toInsert = toSort[nextVal];
                int j = nextVal - 1;
                while (j >= 0 && toInsert < toSort[j])
                {
                    toSort[j + 1] = toSort[j];
                    j--;
                }
                toSort[j + 1] = toInsert;
            }
    }

    private static void swap(int[] toSort, int i, int j)
    {
        int temp = toSort[i];
        toSort[i] = toSort[j];
        toSort[j] = temp;
    }

}

I haven't tested this with your algorithm, and I don't know what kind of data set you're running with, but consider choosing a better pivot than the leftmost element. 我尚未使用您的算法对此进行过测试,也不知道您要使用哪种数据集,但是请考虑选择比最左边的元素更好的数据透视。 From Wikipedia on Quicksort: 从Quicksort上的Wikipedia:

Choice of pivot In very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. 枢轴的选择在quicksort的早期版本中,通常将分区的最左边的元素选择为枢轴元素。 Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. 不幸的是,这在已排序的数组上导致最坏情况的行为,这是一个相当常见的用例。 The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot 通过为枢轴选择随机索引,选择分区的中间索引或(特别是对于较长的分区)选择枢轴的分区的第一个,中间和最后一个元素的中位数,可以轻松解决此问题

Figured it out. 弄清楚了。

Actually, not my sorts fault at all. 实际上,这根本不是我的错。 I was generating numbers between the range of 0-100 (for testing to make sure it was sorted). 我正在生成介于0到100之间的数字(用于测试以确保已排序)。 This resulted in tons of duplicates, which meant way to many partitions. 这导致大量重复,这意味着要进行许多分区。 Changing the range to min_int and max_int made it go a lot quicker. 将范围更改为min_int和max_int使其更快。

Thanks for your help though :D 谢谢您的帮助,但:D

When the input array is large, its natural to expect that recursive functions run into stack overflow issues. 当输入数组很大时,自然会期望递归函数会遇到堆栈溢出问题。 which is what is happening here when you try with the above code. 当您尝试使用上面的代码时,这就是这里发生的情况。 I would recommend you to write iterative Quicksort using your own stack. 我建议您使用自己的堆栈编写迭代式Quicksort。 It should be fast because there is no stack frame allocations/deallocations done at run time. 它应该很快,因为在运行时没有完成堆栈帧分配/取消分配。 You won't run into stack overflow issues also. 您也不会遇到堆栈溢出问题。 Performance also depends on at what point you are running insertion sort. 性能还取决于您在什么时候运行插入排序。 I don't have a particular input size where insertion sort performs badly compared to quicksort. 与快速排序相比,插入排序的效果不佳,我没有特定的输入大小。 I would suggest you to try with different sizes and I'm sure you will notice difference. 我建议您尝试使用不同的尺寸,我相信您会注意到其中的不同。

You might also want to use binary search in insertion sort to improve performance. 您可能还希望在插入排序中使用二进制搜索来提高性能。 I don't know how much it improves when you run on smaller input but its a nice trick to play. 当您使用较小的输入时,我不知道它会改善多少,但是这是一个很好的玩法。

I don't want to share code because that doesn't make you learn how to convert recursive quicksort to iterative one. 我不想共享代码,因为那不会使您学习如何将递归快速排序转换为迭代快速排序。 If you have problems in converting to iterative one let me know. 如果您在转换为迭代时遇到问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM