简体   繁体   中英

Sorting a range (with no duplicates) in C++, is std::vector and std::sort faster than std::set?

I have a sequence of double (with no duplicates) and I need to sort them. Is filling a vector and then sort ing it faster than insert ing the values in a set ?

Is this question answerable without a knowledge of the implementation of the standard library (and without a knowledge of the hardware on which the program will run) but just with the information provided by the C++ standard?

#include <vector>
#include <set>
#include <algorithm>
#include <random>
#include <iostream>

std::uniform_real_distribution<double> unif(0,10000);
std::default_random_engine re;

int main()
{
    std::vector< double > v;
    std::set< double > s;
    std::vector< double > r;
    size_t sz = 10;
    for(size_t i = 0; i < sz; i++) {
        r.push_back( unif(re) );
    }

    for(size_t i = 0; i < sz; i++) {
        v.push_back(r[i]);
    }
    std::sort(v.begin(),v.end());

    for(size_t i = 0; i < sz; i++) {
        s.insert(r[i]);
    }

    return 0;
}

From the C++ standard, all we can say is that they both have the same asymptotic complexity ( O(n*log(n)) ).

The set may be faster for large objects that can't be efficiently moved or swapped, since the objects don't need to be moved more than once. The vector may be faster for small objects, since sorting it involves no pointer updates and less indirection.

Which is faster in any given situation can only be determined by measuring (or a thorough knowledge of both the implementation and the target platform).

The use of vector may be faster because of data cache factors as the data operated upon will be in a more coherent memory region (probably).

The vector will also have less memory overhead per-value.

If you can, reserve the vector size before inserting data to minimize effort during filling the vector with values.

就复杂性而言,两者应相同,即nlog(n)。

The answer is not trivial. If you have 2 main sections in your software: 1st setup , 2nd lookup and lookup is used more than setup : the sorted vector could be faster, because of 2 reasons:

  1. lower_bound <algorithm> function is faster than the usual tree implementation of <set> ,
  2. std::vector memory is allocated less heap page, so there will be less page faults while you are looking for an element.

If the usage is mixed, or lookup is not more then setup , than <set> will be faster. More info: Scott Meyers: Effective STL, Item 23 .

Since you said sorting in a range, you could use partial_sort instead of sorting the entire collection.
If we don't want to disturb the existing collection and want to have a new collection with sorted data and no duplicates, then std::set gives us a straight forward solution.

#include <vector>
#include <set>
#include <algorithm>
#include <iostream>

using namespace std;


int main()
{
    int arr[] = { 1, 3, 4, 1, 6, 7, 9, 6 , 3, 4, 9 };
    vector<int> ints ( arr, end(arr));
    const int ulimit = 5;
    auto last = ints.begin();
    advance(last, ulimit);
    set<int> sortedset;
    sortedset.insert(ints.begin() , last);

    for_each(sortedset.begin(), sortedset.end(), [](int x) { cout << x << "\n"; });
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM