排序优化

Question

I'm currently following an algorithms class and thus decided it would be good practice to implement a few of the sorting algorithms and compare them. 我目前正在学习算法课程，因此决定实施一些排序算法并进行比较是一种很好的做法。 I implemented merge sort and quick sort and then compared their run time, along with the std::sort: My computer isn't the fastest but for 1000000 elements I get on average after 200 attempts: 我实现了归类排序和快速排序，然后比较了它们的运行时间以及std :: sort：我的计算机不是最快的，但经过200次尝试，平均得到了1000000个元素：

std::sort -> 0.620342 seconds std :: sort-> 0.620342秒
quickSort -> 2.2692 quickSort-> 2.2692
mergeSort -> 2.19048 mergeSort-> 2.19048

I would like to ask if possible for comments on how to improve and optimize the implementation of my code. 我想问一下关于如何改进和优化代码实现的意见。

void quickSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
if(s >= e)
    return;

int pivot;
int a = s + (rand() % (e-s));
int b = s + (rand() % (e-s));
int c = s + (rand() % (e-s));

//find median of the 3 random pivots
int min = std::min(std::min(nums[a],nums[b]),nums[c]);
int max = std::max(std::max(nums[a],nums[b]),nums[c]);
if(nums[a] < max && nums[a] > min)
    pivot = a;
else if(nums[b] < max && nums[b] > min)
    pivot = b;
else
    pivot = c;

int temp = nums[s];
nums[s] = nums[pivot];
nums[pivot] = temp;

//partition
int i = s + 1, j = s + 1;
for(; j < e; j++){
    if(comparator(nums[j] , nums[s])){
        temp = nums[i];
        nums[i++] = nums[j];
        nums[j] = temp;
    }
}
temp = nums[i-1];
nums[i-1] = nums[s];
nums[s] = temp;

//sort left and right of partition
quickSort(nums,s,i-1,comparator);
quickSort(nums,i,e,comparator);

Here s is the index of the first element, e the index of the element after the last. s是第一个元素的索引，e是最后一个元素之后的索引。 defaultComparator is just the following lambda function: defaultComparator只是以下lambda函数：

auto defaultComparator = [](int a, int b){ return a <= b; 自动defaultComparator = []（int a，int b）{返回a <= b; }; };

std::vector<int> mergeSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
std::vector<int> sorted(e-s);
if(s == e)
    return sorted;
int mid = (s+e)/2;
if(s == mid){
    sorted[0] = nums[s];
    return sorted;
}
std::vector<int> left = mergeSort(nums, s, mid);
std::vector<int> right = mergeSort(nums, mid, e);

unsigned int i = 0, j = 0;
unsigned int c = 0;
while(i < left.size() || j < right.size()){
    if(i == left.size()){
        sorted[c++] = right[j++];
    }
    else if(j == right.size()){
        sorted[c++] = left[i++];
    }
    else{
        if(comparator(left[i],right[j]))
            sorted[c++] = left[i++];
        else
            sorted[c++] = right[j++];
    }
}
return sorted;

Thank you all 谢谢你们

Answer 1

The first thing I see, you're passing a std::function<> which involves a virtual call, one of the most expensive calling strategies. 我首先看到的是，您正在传递一个std::function<> ，它涉及一个虚拟调用，这是最昂贵的调用策略之一。 Give it a try with simply a template T (which might be a function) - the result will be direct function calls. 尝试使用仅模板T（可能是函数）进行尝试-结果将是直接函数调用。

Second thing, never do this result-in-local-container ( vector<int> sorted; ) when optimizing and when in-place variant exists. 第二件事，在优化时以及在存在就地变量时，切勿执行此result-in-local-container（ vector<int> sorted; ）。 Do in-place sort. 做就地排序。 Client should be aware of you shorting their vector; 客户应意识到您在短接其向量； if they wish, they can make a copy in advance. 如果愿意，可以提前进行复印。 You take non-const reference for a reason. 您使用非常量引用是有原因的。 [1] [1]

Third, there's a cost associated with rand() and it's far from negligible. 第三，与rand()有关的成本远非微不足道。 Unless you're sure you need the randomized variant of quicksort() (and its benefits regarding 'no too bad sequence'), use just the first element as pivot. 除非您确定需要quicksort()的随机变体（及其对“没有太糟糕的序列”的好处），否则仅将第一个元素用作枢轴。 Or the middle. 还是中间。

Use std::swap() to swap two elements. 使用std::swap()交换两个元素。 Chances are, it gets translated to xchg (on x86 / x64) or an equivalent, which is hard to beat. 很有可能，它被转换为xchg（在x86 / x64上）或同等版本，很难被击败。 Whether the optimizer identifies your intend to swap at these places without being explicit could be verified from assembly output. 可以从程序集输出中验证优化器是否确定您打算在这些位置进行交换而没有明确显示。

The way you found the median of three elements is full of conditional moves / branches. 您发现三个元素的中位数的方式充满了条件移动/分支。 It's simply nums[a] + nums[b] + nums[c] - max - min ; 它只是nums[a] + nums[b] + nums[c] - max - min ; but getting nums[...] , min and max at the same time could also be optimized further. 但是同时获得nums[...] ， min和max也可以进一步优化。

Avoid i++ when aiming at speed. 追求速度时避免使用i++ 。 While most optimizers will usually create good code, there's a small chance that it's suboptimal. 虽然大多数优化器通常会创建良好的代码，但它不太理想的可能性很小。 Be explicit when optimizing ( ++i after the swap), but _only_when_optimizing_. 优化时要明确（交换后为++i ），但要_only_when_optimizing_。

But the most important one: valgrind/callgrind/kcachegrind. 但是最重要的一个：valgrind / callgrind / kcachegrind。 Profile, profile, profile. 个人资料，个人资料，个人资料。 Only optimize what's really slow. 只优化真正慢的东西。

[1] There's an exception to this rule: const containers that you build from non-const ones. [1]此规则有一个例外：从非const容器构建的const容器。 These are usually in-house types and are shared across multiple threads, hence it's better to keep them const & copy when modification is needed. 这些通常是内部类型，并且在多个线程之间共享，因此最好在需要修改时将它们保留为const＆copy。 In this case, you'll allocate a new container (either const or not) in your function, but you'll probably keep const one for users' convenience on API. 在这种情况下，您将在函数中分配一个新容器（无论是否为const），但为了用户方便使用API，可能会保留const一个。

Answer 2

For quick sort, use Hoare like partition scheme. 为了快速排序，请使用类似于分区方案的Hoare。

http://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme http://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme

Median of 3 only needs 3 if / swap statements (effectively a bubble sort). 中位数3仅需要3 if / swap语句（实际上是冒泡排序）。 No need for min or max check. 不需要最小或最大检查。

    if(nums[a] > nums[b])
        std::swap(nums[a], nums[b]);
    if(nums[b] > nums[c])
        std::swap(nums[b], nums[c]);
    if(nums[a] > nums[b])
        std::swap(nums[a], nums[b]);
    // use nums[b] as pivot value

For merge sort, use an entry function that does a one time creation of a working vector, then pass that vector by reference to the actual merge sort function. 对于合并排序，请使用一次创建工作向量的入口函数，然后将该向量通过引用传递给实际的合并排序函数。 For top down merge sort, the indices determine the start, middle, and end of each sub-vector. 对于自上而下的合并排序，索引确定每个子向量的开始，中间和结尾。

If using top down merge sort, the code can avoid copying data by alternating the direction of merge depending on the level of recursion. 如果使用自上而下的合并排序，则代码可以根据递归级别通过改变合并方向来避免复制数据。 This can be done using two mutually recursive functions, the first one where the result ends up in the original vector, the second one where the result ends up in the working vector. 这可以使用两个相互递归的函数来完成，第一个函数的结果以原始向量结尾，第二个函数的结果以工作向量结尾。 The first one calls the second one twice, then merges from the working vector back to the original vector, and vice versa for the second one. 第一个调用第二个，两次，然后从工作向量合并回到原始向量，反之亦然。 For the second one, if the size == 1, then it needs to copy 1 element from the original vector to the working vector. 对于第二个，如果size == 1，则需要将1个元素从原始向量复制到工作向量。 An alternative to two functions is to pass a boolean for which direction to merge. 两个函数的替代方法是传递一个布尔值以合并哪个方向。

If using bottom up merge sort (which will be a bit faster), then each pass swaps vectors. 如果使用自下而上的合并排序（速度会更快一些），则每次传递都会交换向量。 The number of passes needed is determined up front and in the case of an odd number of passes, the first pass swaps in place, so that the data ends up in the original vector after all merge passes are done. 所需的通过次数是预先确定的，在奇数次通过的情况下，第一个通过会交换到位，以便在完成所有合并通过之后，数据最终以原始向量结尾。

排序优化

问题描述

2 个解决方案

解决方案1
2 2017-01-10 23:38:34

解决方案2
0 2017-01-11 04:00:27

排序优化

问题描述

2 个解决方案

解决方案1 2 2017-01-10 23:38:34

解决方案2 0 2017-01-11 04:00:27

解决方案1
2 2017-01-10 23:38:34

解决方案2
0 2017-01-11 04:00:27