简体繁体 English

C ++ std :: sort实现

[英]C++ std::sort implementation

原文 2019-01-25 19:46:40 7 1 c++/ sorting/ c++11/ quicksort

I am wondering as to the implementation of std::sort in c++11 . 我想知道在c++11中std::sort的实现。 I have an MPI -managed parallel code, where each rank reads data from a file into a vector A that needs to be sorted. 我有一个MPI管理的并行代码，其中每个等级将文件中的数据读取到需要排序的向量A中。 Each rank does calls std::sort to do this. 每个等级都调用std::sort来执行此操作。

When I run this with ~100 ranks, there is sometimes one rank which hangs at this call to std::sort . 当我用〜100等级运行此命令时，有时在调用std::sort会挂一个等级。 Eventually, I realized, it's not hanging, the sort just takes very long. 最终，我意识到它没有挂起，只是花了很长时间。 That is, one rank will take ~200 times longer to sort than all of the others. 也就是说，一个等级的排序时间要比其他等级长200倍左右。

At first I suspected it was a load-balancing issue. 起初，我怀疑这是一个负载平衡问题。 Nope, I've checked thoroughly that the size of A per rank is as balanced as possible. 不，我已经彻底检查了每个等级的A大小是否尽可能平衡。

I've concluded that it may just simply be that one rank has an initial condition of A such that something like the worst-case performance of quicksort is realized (or at least a non-ideal-case). 我已经得出结论，这可能只是一个等级的初始条件为A ，从而实现了快速排序的最坏情况（或至少是非理想情况）之类的事情。

Why do I think this? 我为什么这么认为呢？

If I change the MPI configuration (thereby perturbing the content of A per rank, since it comes from a file read), the issue disappears, or it can move to other ranks. 如果我更改了MPI配置（由于每个等级A的内容都来自读取的文件，因此它会受到干扰），问题就会消失，或者它可能会转移到其他等级。
If I change std::sort to std::stable_sort (no longer using the quicksort algorithm), then all is fine. 如果我将std::sort更改为std::stable_sort （不再使用quicksort算法），那么一切都很好。

However, it seems that it would be most sensible to implement a quicksort by choosing a random pivot point on each iteration. 但是，似乎似乎最明智的做法是通过在每次迭代中选择一个随机枢轴点来实现快速排序。 If that were the case with std::sort , then it would be overwhelmingly unlikely to choose a worst-case value randomly from A on many iterations (which would be required to result in a 200x performance hit). 如果std::sort是这种情况，那么在许多次迭代中从A随机选择一个最坏情况的值（这将导致200倍的性能下降）是绝对不可能的。

Thus, my observations suggest that std::sort implements a fixed quicksort pivot value (eg always choose the first value in the array, or something like that). 因此，我的观察结果表明std::sort实现了固定的 quicksort枢轴值（例如，始终选择数组中的第一个值或类似的值）。 This is the only way that the behavior I'm seeing would be likely, and also give consistent results when re-running on the same MPI configuration (which it does). 这是我所看到的行为唯一可能的方式，并且在相同的MPI配置上重新运行时，它也会给出一致的结果（确实如此）。

Am I correct in that conclusion? 我的结论正确吗？ I did manage to find the std source, but the sort function is totally unreadable, and makes a plethora of calls to various helper functions, and I'd rather avoid a rabbit hole. 我确实设法找到了std源，但是sort函数是完全不可读的，并且对各种辅助函数进行了大量调用，所以我宁愿避开兔子洞。 Aside from that, I'm running on an HPC system, and it's not even clear to me how to be sure what exactly mpicxx is linking to. 除此之外，我正在HPC系统上运行，我什至还不清楚如何确定mpicxx到底链接到什么。 I can't find any documentation which describe the algorithm implementation 我找不到任何描述算法实现的文档

1 个解决方案

std::sort is implementation specific. std::sort是特定于实现的。

And since C++11, regular quicksort is no longer a valid implementation as required complexity move from O(N log N) on average to O(N log N) . 而且，由于C ++ 11，普通快速排序不再是一个有效的实现从所需的复杂性举动O(N log N)上的平均 O(N log N)