简体   繁体   中英

Sorting multiple vectors in single sort call C++

This answer demonstrates how to efficiently obtain an indices vector using std::sort on a vector of values using the nice new-ish C++11 functionality (there's also a variety of duplicates of that question as well). It also hints that you can obtain the double output of both the sorted vector and the sorted indices by "using an extra vector." However, the only way I can achieve this is by calling std:sort a second time. I'm working with arrays with lengths of tens, maybe hundreds, of thousands of elements trying to focus on efficiency. Is it possible to obtain both the sorted vector and the indices of the sort from a single call to std::sort ?

More generally, my question is: can one sort multiple vectors with a single sort call? The assumption is the sorting order is based on only one of the supplied vectors.

What I've come up with in the meantime is below (a slight modification to the code in the linked answer). As you can see, it requires a call to std::sort for each vector being sorted, even though they are all to be ordered according to the sorting of a single vector. I suspect there may be a way to do this by passing references to the lambda compare function, but I can't seem to make it work.

#include <numeric>
#include <algorithm>

using std;

void sort_vectors(vector<size_t> idx, vector<double> &v) {

  // sort indexes based on comparing values in v
  sort(idx.begin(), idx.end(),
       [&v](size_t i1, size_t i2) {return v[i1] < v[i2];});

  // Sort the actual vector
  sort(v.begin(), v.end());

  return idx;
}

std::sort takes iterators: Although a custom sort could likely take both indexes and the values in a single sort step, it's unlikely to be of much use (and may require different algorithms, making it slower).

Algorithm Design

Why? Because std::sort performs in O(n*logn) time. Moving elements from the sorted indexes will take O(n) time, which is relatively cheap in comparison.

Using the example from above, in the link given, we have this existing code:

using namespace std;

template <typename T>
vector<size_t> sort_indexes(const vector<T> &v) 
{

  // initialize original index locations
  vector<size_t> idx(v.size());
  iota(idx.begin(), idx.end(), 0);

  // sort indexes based on comparing values in v
  sort(idx.begin(), idx.end(),
       [&v](size_t i1, size_t i2) {return v[i1] < v[i2];});

  return idx;
}

We can now create a sorted array from these indexes, a cheap step:

template <typename T>
vector<T> sorted_array(const vector<T> &v, const vector<size_t>& i) 
{
    vector<T> out;
    out.reserve(v.size())
    for (auto j: i) {
        out.emplace_back(v[j]);
    }
}

If copying the values is too prohibitive, you can use a std::reference_wrapper to create a non-nullable wrapper.

template <typename T>
vector<reference_wrapper<const T>> sorted_array(const vector<T> &v, const vector<size_t>& i) 
{
    vector<reference_wrapper<const T>> out;
    out.reserve(v.size())
    for (auto j: i) {
        out.emplace_back(std::cref(v[j]));
    }
}

Even for large arrays, this should be pretty efficient.

Caution

Don't try to sort two arrays at once. Don't try to move items in your value array when sorting the index array. Why? Because the comparison is index-based for the value array: moving items will destroy the sort in the original array. Since moving the items to the correct position is so cheap once you have the sorted indexes, don't worry about performance here: the sort is the bottleneck.

The reorder of the array(s) or vector(s) according to sorted indices can be done in place in O(n) time. This example sorts two arrays using a third array of indices. During the reorder, the array of indices is restored back to it's original state of going from 0 to n-1. I manually did the iota part in this example, and it doesn't use templates, but could be easily converted to templates and vectors:

#include <algorithm>
#include <iostream>

int main()
{
int A[8] = {8,6,1,7,5,3,4,2};
char B[8] = {'h','f','a','g','e','c','d','b'};
size_t I[8];
size_t i, j, k;
int ta;
char tb;
    // create array of indices to A[]
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++)
        I[i] = i;
    // sort array of indices according to A[]
    std::sort(I, I+sizeof(I)/sizeof(I[0]),
              [&A](int i, int j) {return A[i] < A[j];});
    // reorder A[] B[] I[] according to I[]
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++){
        if(i != I[i]){
            ta = A[i];
            tb = B[i];
            k = i;
            while(i != (j = I[k])){
                A[k] = A[j];
                B[k] = B[j];
                I[k] = k;
                k = j;
            }
            A[k] = ta;
            B[k] = tb;
            I[k] = k;
        }
    }
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++)
        std::cout << A[i] << ' ';
    std::cout << std::endl;
    for(i = 0; i < sizeof(B)/sizeof(B[0]); i++)
        std::cout << B[i] << ' ';
    std::cout << std::endl;
    return 0;
}

or an array of pointers can be used instead of an array of indices, which allows a normal compare function instead of a lambda compare function.

#include <algorithm>
#include <iostream>

bool compare(const int *p0, const int *p1)
{
    return *p0 < *p1;
}

int main()
{
int A[8] = {8,6,1,7,5,3,4,2};
char B[8] = {'h','f','a','g','e','c','d','b'};
int *pA[8];
size_t i, j, k;
int ta;
char tb;
    // create array of pointers to A[]
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++)
        pA[i] = &A[i];
    // sort array of pointers according to A[]
    std::sort(pA, pA+sizeof(A)/sizeof(A[0]), compare);
    // reorder A[] B[] pA[] according to pA[]
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++){
        if(i != pA[i]-A){
            ta = A[i];
            tb = B[i];
            k = i;
            while(i != (j = pA[k]-A)){
                A[k] = A[j];
                B[k] = B[j];
                pA[k] = &A[k];
                k = j;
            }
            A[k] = ta;
            B[k] = tb;
            pA[k] = &A[k];
        }
    }
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++)
        std::cout << A[i] << ' ';
    std::cout << std::endl;
    for(i = 0; i < sizeof(B)/sizeof(B[0]); i++)
        std::cout << B[i] << ' ';
    std::cout << std::endl;
    return 0;
}

Rather than constructing an index vector, and applying it to each of the vectors you want to sort, you can instead make a range that references all the vectors to sort, and sort that by the appropriate element.

template <typename OrderBy, typename... Others>
void sort_multiple(OrderBy& order_by, Others&... others) {
    auto range = ranges::views::zip(order_by, others...);
    ranges::actions::sort(range.begin(), range.end(), std::less{}, [](auto & tuple){ return get<0>(tuple); });
}

The only unfortunate thing about this formulation is that there isn't a simple way of specifying a custom compare, because ... arguments are greedy.

struct custom_compare_t {} custom_compare;

template <typename Compare, typename OrderBy, typename... Others>
void sort_multiple(custom_compare_t, Compare compare, OrderBy& order_by, Others&... others) {
    auto range = ranges::views::zip(order_by, others...);
    ranges::actions::sort(range.begin(), range.end(), compare, [](auto & tuple){ return get<0>(tuple); });
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM