[英]Why is std::shuffle as slow (or even slower than) std::sort?
Consider the simple code that measures execution time and the number of swaps performed: 考虑测量执行时间和执行的交换次数的简单代码:
#include <iostream>
#include <vector>
#include <random>
#include <chrono>
#include <algorithm>
struct A {
A(int i = 0) : i(i) {}
int i;
static int nSwaps;
friend void swap(A& l, A& r)
{
++nSwaps;
std::swap(l.i, r.i);
}
bool operator<(const A& r) const
{
return i < r.i;
}
};
int A::nSwaps = 0;
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::milliseconds;
int main()
{
std::vector<A> v(10000000);
std::minstd_rand gen(std::random_device{}());
std::generate(v.begin(), v.end(), [&gen]() {return gen();});
auto s = high_resolution_clock::now();
std::sort(v.begin(), v.end());
std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count()
<< "ms with " << A::nSwaps << " swaps\n";
A::nSwaps = 0;
s = high_resolution_clock::now();
std::shuffle(v.begin(), v.end(), gen);
std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count()
<< "ms with " << A::nSwaps << " swaps\n";
}
The output of the program depends on the compiler and the machine, but they are quite similar in their nature. 程序的输出取决于编译器和机器,但它们的性质非常相似。 On my laptop with VS2015, I get 1044ms with ~100 million swaps for sort and 824ms with 10 million swaps for shuffle.
在我使用VS2015的笔记本电脑上,我得到了1044毫秒,其中有1亿个交换用于排序,824毫秒用1000万个交换用于随机播放。
libstdc++ and libc++ do twice as few swaps for sort (~50M) and the results are as follows. libstdc ++和libc ++进行排序(~50M)的交换次数是两倍,结果如下。 Rextester gives me similar results: gcc sort 854ms, shuffle 565ms, clang sort 874ms, shuffle 648ms.
Rextester给了我类似的结果: gcc sort 854ms,shuffle 565ms, clang sort 874ms,shuffle 648ms。 The results shown by ideone and coliru are even more drastic: ideone sort 1181ms , shuffle 1292ms and coliru sort 1157ms , shuffle 1461ms .
ideone和coliru显示的结果更加激烈: 意思是 1181毫秒 ,洗牌1292毫秒 , coliru排序1157毫秒 ,洗牌1461毫秒 。
So what's the culprit here? 那么罪魁祸首是什么? Why with 5 to 10 times more swaps sort is almost as fast or even faster than a simple shuffle?
为什么交换排序的5到10倍几乎与简单的shuffle一样快或甚至更快? I'm not even taking into account comparisons and more complex logic in
std::sort
including choosing insertion, heap or quick sort algorithms, etc. I doubt it's the random engine - I've even chosen the simplest one std::minstd_rand
which basically does an integer multiplication and a modulo. 我甚至没有考虑
std::sort
比较和更复杂的逻辑,包括选择插入,堆或快速排序算法等。我怀疑它是随机引擎 - 我甚至选择了最简单的一个std::minstd_rand
基本上是整数乘法和模数。 Is it the cache misses that make shuffle relatively slow? 是否缓存未命中使得shuffle相对较慢?
PS: the behaviour is the same for simple std::vector<int>
PS:简单的
std::vector<int>
的行为是相同的
std::random_shuffle
usually works as follows: std::random_shuffle
通常如下工作:
//random(k) generates uniform random from 0 to k-1 inclusive
for (int i = 1; i < n; i++)
swap(arr[i], arr[random(i + 1)]);
So we can see two sources of inefficiency here: 所以我们可以在这里看到两个低效率的来源:
Speaking of point 2, sorting algorithms like quicksort are much more cache-friendly: most of their memory accesses hit cache. 说到第2点,像quicksort这样的排序算法更加适合缓存:大多数内存访问都是缓存。
First, std::sort
is not required to use an unqualified swap
. 首先,
std::sort
不需要使用非限定swap
。 It's not a customization point, and you cannot rely on your own user-defined swap
being found through ADL. 它不是自定义点,您不能依赖于通过ADL找到的自己的用户定义
swap
。 But even it would, sort
can also use std::rotate
, which can do swap
but also memmove
. 但即便如此,
sort
也可以使用std::rotate
,它可以进行swap
但也可以使用memmove
。 This would not be counted by your implementation. 这不会被您的实施计算在内。
Second, the Standard Library only specifies asymptotic complexity, which is O(N)
for std::shuffle
and O(N log N)
for std::sort
. 其次,标准库仅指定渐近复杂性,这是
O(N)
为std::shuffle
和O(N log N)
为std::sort
。 So you should measure for different values of N
(eg powers of 2 from 65K to 65M amounts of elements) and measure the scaling behavior. 因此,您应该测量
N
不同值(例如,从65K到65M量的元素的2的幂)并测量缩放行为。 For small N
, the constant of proportionality of sort
could be much smaller than the one for shuffle
since it has to call a potentially expensive random generator. 对于小
N
, sort
的比例常数可能比shuffle
小得多,因为它必须调用潜在的昂贵的随机发生器。
Update : it indeed appears that constant factors and/or cache-effects are the culprit (as pointed out by @stgatilov). 更新 :确实看起来常数因素和/或缓存效应是罪魁祸首(正如@stgatilov所指出的那样)。 See this DEMO where I run
std::sort
on the data after std::shuffle
has been called. 看到这个DEMO ,我在调用
std::shuffle
后对数据运行std::sort
。 Runtime for sort
is about half of that of shuffle
, with 5x more swaps. sort
运行时间约为shuffle
一半,交换次数增加5倍。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.