简体   繁体   English

哪个 STL 容器最适合 std::sort? (这还重要吗?)

[英]Which STL container is best for std::sort? (Does it even matter?)

The title speaks for itself....标题不言自明......

Does choice of container affects the speed of the default std::sort algorithm somehow or not?容器的选择是否会以某种方式影响默认 std::sort 算法的速度? For example, if I use list, does the sorting algorithm just switch the node pointers or does it switch the whole data in the nodes?例如,如果我使用列表,排序算法是只切换节点指针还是切换节点中的整个数据?

The choice does make a difference, but predicting which container will be the most efficient is very difficult.选择确实会产生影响,但预测哪个容器最有效是非常困难的。 The best approach is to use the container that is easiest for your application to work with (probably std::vector), see if sorting is adequately fast with that container, and if so stick wth it.最好的方法是使用对您的应用程序来说最容易使用的容器(可能是 std::vector),看看该容器的排序是否足够快,如果是的话,坚持使用它。 If not, do performance profiling on your sorting problem and choose different container based on the profile data.如果没有,请对您的排序问题进行性能分析,并根据配置文件数据选择不同的容器。

As an ex-lecturer and ex-trainer, I sometimes feel personally responsible for the common idea that a linked list has mystical performance enhancing properties.作为一名前讲师和前培训师,我个人有时会为链表具有神秘的性能增强特性这一普遍观点负责。 Take it from one who knows: the only reason a linked list appear in so many text books and tutorials is because it is covenient for the people who wrote those books and tutorials to have a data structure that can illustrate pointers, dynamic memory mangement, recursion, searching and sorting all in one - it has nothing to do with efficiency.知道的人说一下:链表出现在这么多教科书和教程中的唯一原因是因为编写这些书和教程的人有一个可以说明指针、动态 memory 管理、递归的数据结构很方便,搜索和排序都在一个 - 它与效率无关。

I don't think std::sort works on lists as it requires a random access iterator which is not provided by a list<> .我认为std::sort不适用于列表,因为它需要一个list<>不提供的随机访问迭代器。 Note that list<> provides a sort method but it's completely separate from std::sort .请注意, list<>提供了一种sort方法,但它与std::sort完全分开。

The choice of container does matter.容器的选择很重要。 STL's std::sort relies on iterators to abstract away the way a container stores data. STL 的std::sort依赖于迭代器来抽象出容器存储数据的方式。 It just uses the iterators you provide to move elements around.它只是使用您提供的迭代器来移动元素。 The faster those iterators work in terms of accessing and assigning an element, the faster the std::sort would work.这些迭代器在访问和分配元素方面工作得越快, std::sort工作得越快。

std::list is definitely not a good (valid) choice for std::sort() , because std::sort() requires random-access iterators. std::list绝对不是std::sort()的好(有效)选择,因为std::sort()需要随机访问迭代器。 std::map and friends are also no good because an element's position cannot be enforced; std::map和朋友也不好,因为不能强制执行元素的 position; that is, the position of an element in a map cannot be enforced by the user with insertion into a particular position or a swap.也就是说,用户不能通过插入特定的 position 或交换来强制执行 map 中元素的 position。 Among the standard containers we're down to std::vector and std::deque .在标准容器中,我们只剩下std::vectorstd::deque

std::sort() is like other standard algorithms in that it only acts by swapping elements' values around ( *t = *s ). std::sort()与其他标准算法一样,它仅通过在 ( *t = *s ) 周围交换元素的值来起作用。 So even if list would magically support O(1) access the links wouldn't be reorganized but rather their values would be swapped.因此,即使列表神奇地支持 O(1) 访问,链接也不会被重新组织,而是它们的值会被交换。

Because std::sort() doesn't change the container's size it should make no difference in runtime performance whether you use std::vector or std::deque .因为std::sort()不会改变容器的大小,所以无论您使用std::vector还是std::deque都应该不会影响运行时性能。 Primitive arrays should be also fast to sort, probably even faster than the standard containers -- but I don't expect the difference in speed to be significant enough to justify using them.原始 arrays 的排序速度也应该很快,甚至可能比标准容器还要快——但我不认为速度上的差异足以证明使用它们是合理的。

It depends on the element type.这取决于元素类型。

If you're just storing pointers (or POD) then vector will be fastest.如果您只是存储指针(或 POD),那么 vector 将是最快的。 If you're storing objects then list's sort will be faster as it will swap nodes and not physical elements.如果您要存储对象,那么列表的排序会更快,因为它将交换节点而不是物理元素。

I totally agree with the statements that guys have posted above.我完全同意上面张贴的声明。 But what is the best way to learn new things?但是学习新事物的最佳方法是什么? Hey.... surely not reading the text and learning by heart but:,,: EXAMPLES :D As recently I immersed in containers specified in STL, here is the quick test code that is self-explanatory, I hope:嘿....当然不是阅读文本并用心学习但是:,,: 示例:D 最近我沉浸在 STL 中指定的容器中,这里是不言自明的快速测试代码,我希望:

#include <iostream>
#include <vector>
#include <deque>
#include <array>
#include <list>
#include <iterator>
#include <cstdlib>
#include <algorithm>
#include "Timer.h"

constexpr int SIZE = 1005000;

using namespace std;

void test();

int main(){
    cout<<"array allocates "<<static_cast<double>(SIZE)/(1024*1024)<<" MB\n";
    test();


    return 0;
}


void test(){
    int values[SIZE];
    int size = 0;

    //init values to sort:
    do{
        values[size++] = rand() % 100000;
    }while(size < SIZE);

    //feed array with values:
    array<int, SIZE> container_1;
    for(int i = 0; i < SIZE; i++)
        container_1.at(i) = values[i];

    //feed vector with values
    vector<int> container_2(begin(values), end(values));
    list<int> container_3(begin(values), end(values)); 
    deque<int> container_4(begin(values), end(values)); 

    //meassure sorting time for containers
    {
       Timer t1("sort array");
       sort(container_1.begin(), container_1.end());
    }

    {
       Timer t2("sort vector");
       sort(container_2.begin(), container_2.end());
    }

    {
       Timer t3("sort list");
       container_3.sort();
    }

    {
       Timer t4("sort deque");
       sort(container_4.begin(), container_4.end());
    }

}

And the code for timer:以及计时器的代码:

#include <chrono>
#include <string>
#include <iostream>

using namespace std;

class Timer{

public:
    Timer(string name = "unnamed") : mName(name){ mStart = chrono::system_clock::now();}
    ~Timer(){cout<<"action "<<mName<<" took: "<<
             chrono::duration_cast<chrono::milliseconds>(
                     chrono::system_clock::now() - mStart).count()<<"ms"<<endl;}
private:
    chrono::system_clock::time_point mStart;
    string mName;
};

Here is the result when no optimization is used ( g++ --std=c++11 file.cpp -o a.ou t):这是不使用优化时的结果( g++ --std=c++11 file.cpp -o a.out ):

array allocates 0.958443 MB数组分配 0.958443 MB
action sort array took: 183ms动作排序数组耗时:183ms
action sort vector took: 316ms动作排序向量耗时:316ms
action sort list took: 725ms动作排序列表耗时:725ms
action sort deque took: 436ms动作排序双端队列花费:436ms

and with optimization ( g++ -O3 --std=c++11 file.cpp -o a.out ):并进行优化( g++ -O3 --std=c++11 file.cpp -o a.out ):

array allocates 0.958443 MB数组分配 0.958443 MB
action sort array took: 55ms动作排序数组耗时:55ms
action sort vector took: 57ms动作排序向量耗时:57ms
action sort list took: 264ms动作排序列表耗时:264ms
action sort deque took: 67ms动作排序双端队列耗时:67ms

Notice that although vector and array has similar times sorting for this case, array size is limited as it is supposed to be initialized on stack (by default, not using own allocators etc.)请注意,尽管在这种情况下向量和数组具有相似的时间排序,但数组大小是有限的,因为它应该在堆栈上初始化(默认情况下,不使用自己的分配器等)

So it depends also if you use optimization for compiler, if not, we may see noticeable difference.所以这也取决于你是否对编译器使用优化,如果没有,我们可能会看到明显的差异。

The sort algorithm knows nothing about your container.排序算法对您的容器一无所知。 All it knows about are random-access iterators.它所知道的只是随机访问迭代器。 Thus you can sort things that aren't even in a STL container.因此,您可以对 STL 容器中没有的内容进行排序。 So how fast it is going to be depends on the iterators you give it, and how fast it is to dereference and copy what they point to.所以它的速度取决于你给它的迭代器,以及解引用和复制它们指向的东西的速度。

std::sort won't work on std::list, since sort requires random access iterators. std::sort 不适用于 std::list,因为 sort 需要随机访问迭代器。 You should use one of std::list's member function sorts for that case.对于这种情况,您应该使用 std::list 的成员 function 排序之一。 Those member functions will efficiently swap around linked list pointers instead of copying elements.这些成员函数将有效地交换链表指针,而不是复制元素。

Vector.向量。

Always use vector as your default.始终使用矢量作为默认值。 It has the lowest space overheads and fastest access of any other container (among other advantages like C-compatible layout and random-access iterators).它具有任何其他容器最低的空间开销和最快的访问速度(以及其他优点,如 C 兼容布局和随机访问迭代器)。

Now, ask yourself - what else you doing with your container?现在,问问你自己——你还用你的容器做什么? Do you need strong exception guarantees?您需要强大的异常保证吗? List, set and map are likely to be better options (though they all have their own sort routines). List、set 和 map 可能是更好的选择(尽管它们都有自己的排序例程)。 Do you need to regularly add elements to the front of your container?您是否需要定期将元素添加到容器的前面? Consider deque.考虑双端队列。 Does your container need to always be sorted?您的容器是否需要始终进行分类? Set and map are likely to be a better fit. Set 和 map 可能更合适。

Finally, figure out specifically what "best" is for you and then choose the most appropriate container and measure how it performs for your needs.最后,具体找出最适合您的“最佳”容器,然后选择最合适的容器并衡量它如何满足您的需求。

It surely does matter, just because different containers have different memory access patterns etc. which could play a role.这确实很重要,因为不同的容器有不同的 memory 访问模式等,这可能会起作用。

However, std::sort doesn't work on std::list<>::iterators as these are not RandomAccessIterators.但是, std::sort不适用于std::list<>::iterators ,因为它们不是 RandomAccessIterators。 Moreover, although it would be possible to implement a specialization for std::list<> that would shuffle the nodes' pointers, it would probably have strange and surprising semantic consequences - eg.此外,虽然有可能实现std::list<>的特化来打乱节点的指针,但它可能会产生奇怪和令人惊讶的语义后果 - 例如。 if you have an iterator inside sorted range in a vector, its value will change after the sorting, which would not be true with this specialization.如果您在向量的排序范围内有一个迭代器,则它的值将在排序后发生变化,这对于这种专业化来说是不正确的。

std::sort requires random access iterators, so your only options to use that are vector or deque. std::sort 需要随机访问迭代器,因此您唯一可以使用的选项是向量或双端队列。 It will swap the values, and at a guess vector will probably perform slightly faster than deque because it typically has a simpler underlying data structure.它将交换值,并且猜测向量的执行速度可能比 deque 稍快,因为它通常具有更简单的底层数据结构。 The difference is likely very marginal though.不过,差异可能非常微不足道。

If you use a std::list, there is a specialisation (std::list::sort) which should swap the pointers rather than the values.如果您使用 std::list,则有一个专门化 (std::list::sort) 应该交换指针而不是值。 However because it's not random access it'll use mergesort instead of quicksort, which will probably mean that the algorithm itself is a little slower.然而,由于它不是随机访问,它将使用合并排序而不是快速排序,这可能意味着算法本身要慢一些。

Anyway, I think the answer is normally vector.无论如何,我认为答案通常是矢量。 If you have large classes for each element so copying overhead dominates the sorting process, list might beat it.如果每个元素都有大型类,因此复制开销支配了排序过程,那么 list 可能会击败它。 Or alternatively you could store pointers to them in a vector and supply a custom predicate to sort them appropriately.或者,您可以将指向它们的指针存储在一个向量中,并提供一个自定义谓词来对它们进行适当的排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM