简体   繁体   中英

Which STL container is best for std::sort? (Does it even matter?)

The title speaks for itself....

Does choice of container affects the speed of the default std::sort algorithm somehow or not? For example, if I use list, does the sorting algorithm just switch the node pointers or does it switch the whole data in the nodes?

The choice does make a difference, but predicting which container will be the most efficient is very difficult. The best approach is to use the container that is easiest for your application to work with (probably std::vector), see if sorting is adequately fast with that container, and if so stick wth it. If not, do performance profiling on your sorting problem and choose different container based on the profile data.

As an ex-lecturer and ex-trainer, I sometimes feel personally responsible for the common idea that a linked list has mystical performance enhancing properties. Take it from one who knows: the only reason a linked list appear in so many text books and tutorials is because it is covenient for the people who wrote those books and tutorials to have a data structure that can illustrate pointers, dynamic memory mangement, recursion, searching and sorting all in one - it has nothing to do with efficiency.

I don't think std::sort works on lists as it requires a random access iterator which is not provided by a list<> . Note that list<> provides a sort method but it's completely separate from std::sort .

The choice of container does matter. STL's std::sort relies on iterators to abstract away the way a container stores data. It just uses the iterators you provide to move elements around. The faster those iterators work in terms of accessing and assigning an element, the faster the std::sort would work.

std::list is definitely not a good (valid) choice for std::sort() , because std::sort() requires random-access iterators. std::map and friends are also no good because an element's position cannot be enforced; that is, the position of an element in a map cannot be enforced by the user with insertion into a particular position or a swap. Among the standard containers we're down to std::vector and std::deque .

std::sort() is like other standard algorithms in that it only acts by swapping elements' values around ( *t = *s ). So even if list would magically support O(1) access the links wouldn't be reorganized but rather their values would be swapped.

Because std::sort() doesn't change the container's size it should make no difference in runtime performance whether you use std::vector or std::deque . Primitive arrays should be also fast to sort, probably even faster than the standard containers -- but I don't expect the difference in speed to be significant enough to justify using them.

It depends on the element type.

If you're just storing pointers (or POD) then vector will be fastest. If you're storing objects then list's sort will be faster as it will swap nodes and not physical elements.

I totally agree with the statements that guys have posted above. But what is the best way to learn new things? Hey.... surely not reading the text and learning by heart but:,,: EXAMPLES :D As recently I immersed in containers specified in STL, here is the quick test code that is self-explanatory, I hope:

#include <iostream>
#include <vector>
#include <deque>
#include <array>
#include <list>
#include <iterator>
#include <cstdlib>
#include <algorithm>
#include "Timer.h"

constexpr int SIZE = 1005000;

using namespace std;

void test();

int main(){
    cout<<"array allocates "<<static_cast<double>(SIZE)/(1024*1024)<<" MB\n";
    test();


    return 0;
}


void test(){
    int values[SIZE];
    int size = 0;

    //init values to sort:
    do{
        values[size++] = rand() % 100000;
    }while(size < SIZE);

    //feed array with values:
    array<int, SIZE> container_1;
    for(int i = 0; i < SIZE; i++)
        container_1.at(i) = values[i];

    //feed vector with values
    vector<int> container_2(begin(values), end(values));
    list<int> container_3(begin(values), end(values)); 
    deque<int> container_4(begin(values), end(values)); 

    //meassure sorting time for containers
    {
       Timer t1("sort array");
       sort(container_1.begin(), container_1.end());
    }

    {
       Timer t2("sort vector");
       sort(container_2.begin(), container_2.end());
    }

    {
       Timer t3("sort list");
       container_3.sort();
    }

    {
       Timer t4("sort deque");
       sort(container_4.begin(), container_4.end());
    }

}

And the code for timer:

#include <chrono>
#include <string>
#include <iostream>

using namespace std;

class Timer{

public:
    Timer(string name = "unnamed") : mName(name){ mStart = chrono::system_clock::now();}
    ~Timer(){cout<<"action "<<mName<<" took: "<<
             chrono::duration_cast<chrono::milliseconds>(
                     chrono::system_clock::now() - mStart).count()<<"ms"<<endl;}
private:
    chrono::system_clock::time_point mStart;
    string mName;
};

Here is the result when no optimization is used ( g++ --std=c++11 file.cpp -o a.ou t):

array allocates 0.958443 MB
action sort array took: 183ms
action sort vector took: 316ms
action sort list took: 725ms
action sort deque took: 436ms

and with optimization ( g++ -O3 --std=c++11 file.cpp -o a.out ):

array allocates 0.958443 MB
action sort array took: 55ms
action sort vector took: 57ms
action sort list took: 264ms
action sort deque took: 67ms

Notice that although vector and array has similar times sorting for this case, array size is limited as it is supposed to be initialized on stack (by default, not using own allocators etc.)

So it depends also if you use optimization for compiler, if not, we may see noticeable difference.

The sort algorithm knows nothing about your container. All it knows about are random-access iterators. Thus you can sort things that aren't even in a STL container. So how fast it is going to be depends on the iterators you give it, and how fast it is to dereference and copy what they point to.

std::sort won't work on std::list, since sort requires random access iterators. You should use one of std::list's member function sorts for that case. Those member functions will efficiently swap around linked list pointers instead of copying elements.

Vector.

Always use vector as your default. It has the lowest space overheads and fastest access of any other container (among other advantages like C-compatible layout and random-access iterators).

Now, ask yourself - what else you doing with your container? Do you need strong exception guarantees? List, set and map are likely to be better options (though they all have their own sort routines). Do you need to regularly add elements to the front of your container? Consider deque. Does your container need to always be sorted? Set and map are likely to be a better fit.

Finally, figure out specifically what "best" is for you and then choose the most appropriate container and measure how it performs for your needs.

It surely does matter, just because different containers have different memory access patterns etc. which could play a role.

However, std::sort doesn't work on std::list<>::iterators as these are not RandomAccessIterators. Moreover, although it would be possible to implement a specialization for std::list<> that would shuffle the nodes' pointers, it would probably have strange and surprising semantic consequences - eg. if you have an iterator inside sorted range in a vector, its value will change after the sorting, which would not be true with this specialization.

std::sort requires random access iterators, so your only options to use that are vector or deque. It will swap the values, and at a guess vector will probably perform slightly faster than deque because it typically has a simpler underlying data structure. The difference is likely very marginal though.

If you use a std::list, there is a specialisation (std::list::sort) which should swap the pointers rather than the values. However because it's not random access it'll use mergesort instead of quicksort, which will probably mean that the algorithm itself is a little slower.

Anyway, I think the answer is normally vector. If you have large classes for each element so copying overhead dominates the sorting process, list might beat it. Or alternatively you could store pointers to them in a vector and supply a custom predicate to sort them appropriately.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM