为什么std :: shuffle比std :: sort慢（甚至慢）？

Question

Consider the simple code that measures execution time and the number of swaps performed: 考虑测量执行时间和执行的交换次数的简单代码：

#include <iostream>

#include <vector>
#include <random>
#include <chrono>
#include <algorithm>

struct A {
    A(int i = 0) : i(i) {}
    int i;
    static int nSwaps;

    friend void swap(A& l, A& r)
    {
        ++nSwaps;
        std::swap(l.i, r.i);
    }

    bool operator<(const A& r) const
    {
        return i < r.i;
    }
};

int A::nSwaps = 0;

using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::milliseconds;


int main()
{
    std::vector<A> v(10000000);

    std::minstd_rand gen(std::random_device{}());
    std::generate(v.begin(), v.end(), [&gen]() {return gen();});

    auto s = high_resolution_clock::now();
    std::sort(v.begin(), v.end());
    std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count() 
        << "ms with " << A::nSwaps << " swaps\n";

    A::nSwaps = 0;
    s = high_resolution_clock::now();
    std::shuffle(v.begin(), v.end(), gen);
    std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count() 
        << "ms with " << A::nSwaps << " swaps\n";
}

The output of the program depends on the compiler and the machine, but they are quite similar in their nature. 程序的输出取决于编译器和机器，但它们的性质非常相似。 On my laptop with VS2015, I get 1044ms with ~100 million swaps for sort and 824ms with 10 million swaps for shuffle. 在我使用VS2015的笔记本电脑上，我得到了1044毫秒，其中有1亿个交换用于排序，824毫秒用1000万个交换用于随机播放。

libstdc++ and libc++ do twice as few swaps for sort (~50M) and the results are as follows. libstdc ++和libc ++进行排序（~50M）的交换次数是两倍，结果如下。 Rextester gives me similar results: gcc sort 854ms, shuffle 565ms, clang sort 874ms, shuffle 648ms. Rextester给了我类似的结果： gcc sort 854ms，shuffle 565ms， clang sort 874ms，shuffle 648ms。 The results shown by ideone and coliru are even more drastic: ideone sort 1181ms , shuffle 1292ms and coliru sort 1157ms , shuffle 1461ms . ideone和coliru显示的结果更加激烈： 意思是 1181毫秒 ，洗牌1292毫秒 ， coliru排序1157毫秒 ，洗牌1461毫秒 。

So what's the culprit here? 那么罪魁祸首是什么？ Why with 5 to 10 times more swaps sort is almost as fast or even faster than a simple shuffle? 为什么交换排序的5到10倍几乎与简单的shuffle一样快或甚至更快？ I'm not even taking into account comparisons and more complex logic in std::sort including choosing insertion, heap or quick sort algorithms, etc. I doubt it's the random engine - I've even chosen the simplest one std::minstd_rand which basically does an integer multiplication and a modulo. 我甚至没有考虑std::sort比较和更复杂的逻辑，包括选择插入，堆或快速排序算法等。我怀疑它是随机引擎 - 我甚至选择了最简单的一个std::minstd_rand基本上是整数乘法和模数。 Is it the cache misses that make shuffle relatively slow? 是否缓存未命中使得shuffle相对较慢？

PS: the behaviour is the same for simple std::vector<int> PS：简单的std::vector<int>的行为是相同的

Answer 1

std::random_shuffle usually works as follows: std::random_shuffle通常如下工作：

//random(k) generates uniform random from 0 to k-1 inclusive
for (int i = 1; i < n; i++)
  swap(arr[i], arr[random(i + 1)]);

So we can see two sources of inefficiency here: 所以我们可以在这里看到两个低效率的来源：

Random number generators are often quite slow. 随机数生成器通常很慢。
Each swap uses a totally random element from the vector. 每个交换使用向量中的完全随机元素。 When the data size is large, the whole vector does not fit into CPU cache, so each such access has to wait until the data is read from RAM. 当数据大小很大时，整个向量不适合CPU缓存，因此每次访问都必须等到从RAM读取数据。

Speaking of point 2, sorting algorithms like quicksort are much more cache-friendly: most of their memory accesses hit cache. 说到第2点，像quicksort这样的排序算法更加适合缓存：大多数内存访问都是缓存。

Answer 2

First, std::sort is not required to use an unqualified swap . 首先， std::sort不需要使用非限定swap 。 It's not a customization point, and you cannot rely on your own user-defined swap being found through ADL. 它不是自定义点，您不能依赖于通过ADL找到的自己的用户定义swap 。 But even it would, sort can also use std::rotate , which can do swap but also memmove . 但即便如此， sort也可以使用std::rotate ，它可以进行swap但也可以使用memmove 。 This would not be counted by your implementation. 这不会被您的实施计算在内。

Second, the Standard Library only specifies asymptotic complexity, which is O(N) for std::shuffle and O(N log N) for std::sort . 其次，标准库仅指定渐近复杂性，这是O(N)为std::shuffle和O(N log N)为std::sort 。 So you should measure for different values of N (eg powers of 2 from 65K to 65M amounts of elements) and measure the scaling behavior. 因此，您应该测量N不同值（例如，从65K到65M量的元素的2的幂）并测量缩放行为。 For small N , the constant of proportionality of sort could be much smaller than the one for shuffle since it has to call a potentially expensive random generator. 对于小N ， sort的比例常数可能比shuffle小得多，因为它必须调用潜在的昂贵的随机发生器。

Update : it indeed appears that constant factors and/or cache-effects are the culprit (as pointed out by @stgatilov). 更新：确实看起来常数因素和/或缓存效应是罪魁祸首（正如@stgatilov所指出的那样）。 See this DEMO where I run std::sort on the data after std::shuffle has been called. 看到这个DEMO ，我在调用std::shuffle后对数据运行std::sort 。 Runtime for sort is about half of that of shuffle , with 5x more swaps. sort运行时间约为shuffle一半，交换次数增加5倍。

为什么std :: shuffle比std :: sort慢（甚至慢）？

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-09-15 13:36:00

解决方案2
1 2015-09-15 13:29:49

为什么std :: shuffle比std :: sort慢（甚至慢）？

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-09-15 13:36:00

解决方案2 1 2015-09-15 13:29:49

解决方案1
4 已采纳 2015-09-15 13:36:00

解决方案2
1 2015-09-15 13:29:49