简体   繁体   English

在不同的STL实现中,C ++ 11 std :: sort中使用了哪些算法?

[英]What algorithms are used in C++11 std::sort in different STL implementations?

The C++11 standard guarantees that std::sort has O(n logn) complexity in the worst case . C ++ 11标准保证std::sort 在最坏的情况下具有O(n logn)复杂度 This is different from the average-case guarantee in C++98/03, where std::sort could be implemented with Quicksort (maybe combined with insertion sort for small n), which has O(n^2) in the worst case (for some specific input, such as sorted input). 这与C ++ 98/03中的平均情况保证不同,其中std::sort可以用Quicksort实现(可能与小n的插入排序结合使用),在最坏的情况下有O(n ^ 2) (对于某些特定输入,例如排序输入)。

Were there any changes in std::sort implementations in different STL libraries? 在不同的STL库中std::sort实现是否有任何变化? How is C++11's std::sort implemented in different STLs? 如何在不同的STL中实现C ++ 11的std::sort

The question is, how can STL say std::sort worst case is O(N log(N)) , even though it is in essence a QuickSort . 问题是, STL如何说std::sort最坏情况是O(N log(N)) ,即使它本质上是一个QuickSort STL's sort is IntroSort . STL的排序是IntroSort IntroSort is in essence a QuickSort, the difference introduced change the worst case complexity. IntroSort本质上是一个QuickSort,引入的差异改变了最坏的情况复杂性。


QuickSort worst case is O(N^2) QuickSort最坏情况是O(N ^ 2)

What ever partitioning you choose, there exist a sequence that QuickSort will run on O(N^2) . 在您选择的分区中,存在一个QuickSort将在O(N ^ 2)上运行的序列。 The partitioning you choose only decreases the probability of the worst case to occur. 您选择的分区只会降低最坏情况发生的概率。 ( Random Pivot Selection , Median-Of-Three, etc. ) 随机选择透视 ,中位数的三, 等等

EDIT: Thanks to @maxim1000 s correction. 编辑:感谢@ maxim1000的修正。 Quicksort with pivot selection algorithm Median of Medians has O(N log(N)) worst case complexity, but due to the overhead it introduces it isn't used in practice. 具有枢轴选择算法的Quicksort 中位数中位数具有O(N log(N))最差情况复杂度,但由于其引入的开销,它在实践中未被使用。 It shows how good selection algorithm, can change the worst-case complexity through pivot selection, theoretically. 它表明了良好的选择算法,从理论上可以通过枢轴选择改变最坏情况的复杂性。


What does IntroSort do? IntroSort做什么?

IntroSort limits the branching of QuickSort. IntroSort限制了QuickSort的分支。 This is the most important point, that limit is 2 * (log N) . 这是最重要的一点,即限制为2 * (log N) When limit is reached, IntroSort can use any sorting algorithm that has worst case complexity of O(N log(N)). 达到限制时,IntroSort可以使用任何具有最差情况复杂度O(N log(N))的排序算法。

Branching stops when we have O(log N) subproblems. 当我们有O(log N)子问题时,分支停止。 We can solve every subproblem O(n log n). 我们可以解决每个子问题O(n log n)。 (Lower case n stands for the subproblem sizes). (小写字母n代表子问题大小)。

Sum of (n log n) is our worst case complexity, now. 现在,(n log n)的总和是我们最坏的情况复杂度。

For the worst case of QuickSort; 对于最坏的QuickSort案例; assume we have an already sorted array, and we select always the first element in this array as the pivot. 假设我们已经有一个已排序的数组,我们总是选择此数组中的第一个元素作为数据透视表。 In every iteration we get rid of only the first element. 在每次迭代中,我们只消除第一个元素。 If we went this way until the end, it would be O(N^2) obviously. 如果我们这样走到最后,那显然是O(N ^ 2) With IntroSort we stop QuickSort, when we reach a depth log(N) , then we use HeapSort for the remaining unsorted array. 使用IntroSort,我们停止QuickSort,当我们达到深度日志(N)时 ,我们将HeapSort用于剩余的未排序数组。

16 -> 1  /**N**/
   \
    > 15 -> 1 /**N - 1**/
         \
          > 14 -> 1 /**N - 2**/
               \
                > 13 -> 1 /**N - log(N)**/  
                     \
                      > 12 /**(HeapSort Now) (N - log(N)) log (N - log(N))**/

Sum them up; 总结一下;

Until branching stops, N + (N - 1) + ... + (N - log(N)) operations done. 在分支停止之前, N + (N - 1) + ... + (N - log(N))操作完成。 Instead of using gauss to sum up, we can simply say N + (N - 1) + ... + (N - log(N)) < N log(N) . 我们可以简单地说N + (N - 1) + ... + (N - log(N)) < N log(N) ,而不是使用高斯来总结。

The HeapSort Part is (N - log(N)) log(N - log(N)) < N log(N) HeapSort部分是(N - log(N)) log(N - log(N)) < N log(N)

Overall complexity < 2 N log(N) . 总体复杂性< 2 N log(N)

Since the constants can be omitted, the worst case complexity of IntroSort is O(N log(N)) . 由于可以省略常量,因此IntroSort的最坏情况复杂度为O(N log(N))


Added Info: GCC STL implementation source code is here . 补充信息: GCC STL实现源代码在这里 Sort function is at line 5461 . Sort功能在5461行。

Correction: * Microsoft .NET * sort Implementation is IntroSort since 2012. Related information is here . 更正: * Microsoft .NET * sort实现自2012年起为IntroSort。相关信息在此处

Browsing the online sources for libstdc++ and libc++ , one can see that both libraries use the full gamut of the well-known sorting algorithms from an intro-sort main loop: 浏览libstdc ++libc ++的在线资源,可以看到两个库都使用了来自intro-sort主循环的众所周知的排序算法:

For std::sort , there is a helper routine for insertion_sort (an O(N^2) algorithm but with a good scaling constant to make it competitive for small sequences), plus some special casing for sub-sequences of 0, 1, 2, and 3 elements. 对于std::sort ,有一个用于insertion_sort的辅助例程(一个O(N^2)算法,但具有良好的缩放常数,使其对小序列具有竞争力),加上一些特殊的外壳,用于0,1的子序列, 2和3个元素。

For std::partial_sort , both libraries use a version of heap_sort ( O(N log N) in general), because that method has a nice invariant that it keeps a sorted subsequence (it typically has a larger scaling constant to make it more expensive for full sorting). 对于std::partial_sort ,两个库都使用一个版本的heap_sort (一般是O(N log N) ),因为该方法有一个很好的不变量,它保持一个有序的子序列(它通常有一个更大的缩放常数,使它更昂贵完整排序)。

For std::nth_element , there is a helper routine for selection_sort (again an O(N^2) algorithm with a good sclaing constant to make it competitive for small sequences). 对于std::nth_element ,有一个用于selection_sort的辅助例程(同样是具有良好sclaing常量的O(N ^ 2)算法,以使其与小序列竞争)。 For regular sorting insertion_sort usually dominates selection_sort , but for nth_element the invariant of having the smallest elements perfectly matches the behavior of selection_sort . 对于经常排序insertion_sort通常占主导地位selection_sort ,但nth_element具有最小元素的完美不变的行为相匹配selection_sort

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM