简体   繁体   中英

C++ Parallel std::vector Sorting With Expensive Copying

Assume I have a vector<int> intVec and a vector<vector<double> > matrix . I want to sort intVec and reorder the first dimension of matrix correspondingly in C++. I realize this question has been asked several times before, but this case has a twist. A vector<double> is expensive to copy, so eg copying both intVec and matrix to a vector<pair<int, vector<double> > , sorting that and copying them back is even more inefficient than usual.

Short of rolling my own custom sorting algorithm, how can I sort intVec and reorder the first dimension of matrix in lockstep without copying any element of matrix and invoking vector 's copy constructor ?

A vector<double> is expensive to copy, so eg copying both intVec and matrix to a vector<pair<int, vector<double> > , sorting that and copying them back is even more inefficient than usual.

The simplest way to get the optimization you want then is to swap the elements of the source vector<vector<double>> into the temporary vector<pair<int, vector<double>> , sort it, and then swap them back to their new locations in the original vector.

There will still be more overhead than is strictly necessary (for example to construct and destroy empty vectors). However, no vector is ever copied and the code remains very similar to what you already have. So if you are correct that the problem is the cost of copying, problem solved.

In C++11 you could move in both directions rather than swapping. I doubt there's much performance difference between moving and swapping with an empty vector, but I'm not sure there isn't.

Based on the space between your two > 's, I'm guessing you're using pre-C++11 C++. In C++11, std::sort seems to move elements whenever possible instead of copying.

You can pass a custom comparator to std::sort . However, even if you do this, you're doing Theta(n log n) copies of pair<int, vector<double> > 's.

I'd guess, based on not actually trying it, that you should sort a pair<int, vector<double> *> (or pair<int, int> if int is big enough), using the second int as an index) instead to get the appropriate permutation and then apply the permutation using vector 's swap member function to avoid copying the vector contents.

One option: create a std::vector<std::pair<int,size_t>> where the first element is the int from intVec and the second element is the original index of that element. Then sort that new vector. Then shuffle your matrix and intVec to the order indicated by the second elements of the pairs (eg, with a single pass, doing swaps).

If you don't want to copy the vector<double> item vector, then make a vector of pointer or indices to the vector<double> items. Sort it along with the main vector.

It is however not at all clear that you'll get a performance gain, and so I suggest that you measure both the straightforward sorting and the smart sorting, and compare.


Example:

#include    <algorithm>
#include    <vector>
#include    <iostream>
using namespace std;

struct Mat
{
    vector< vector< double > >  items;

    Mat( int const size )
        : items( size, vector< double >( size ) )
    {}
};

struct KeyAndData
{
    int                     key;
    vector< double > const* data;

    friend bool operator<( KeyAndData const& a, KeyAndData const& b )
    {
        return a.key < b.key;
    }
};

int main()
{
    int const       data[]  = {3, 1, 4, 1, 5};
    Mat             m( 5 );
    vector<int>     v( 5 );

    for( int i = 0;  i < 5;  ++i )
    {
        m.items[i][i] = v[i] = data[i];
    }

    vector< KeyAndData >        sorted( 5 );
    for( int i = 0;  i < 5;  ++i )
    {
        sorted[i].key = v[i];
        sorted[i].data = &m.items[i];
    }

    sort( sorted.begin(), sorted.end() );
    for( int i = 0;  i < 5;  ++i )
    {
        cout << sorted[i].key << ":  ";

        vector< double > const& r = *sorted[i].data;
        for( int x = 0;  x < 5;  ++x )
        {
            cout << r[x] << " ";
        }
        cout << endl;
    }
}

The obvious answer is to restructure your two vectors as one vector<pair<int, vector<double> > (since the data are clearly tightly coupled aready).

If that's really not an option then create another vector of indexes and sort that instead of the vec and matrix.

Since std::vector::swap operates in constant time, you could use a sorting algorithm that operates via a series of swaps (like quicksort) to sort intVec while simultaneously performing identical swaps on matrix :

#include <iostream>
#include <vector>
#include <algorithm>

// Sorts intVec in [low, high) while also performing identical swaps on matrix.
void iqsort(std::vector<int> &intVec, std::vector<std::vector<double>> &matrix,
            int low, int high) {
  if (low >= high) return;
  int pivot = intVec[low];
  int nLow = low + 1;
  int nHigh = high - 1;
  while (nLow <= nHigh) {
    if (intVec[nLow] <= pivot) {
      ++nLow;
    } else {
      std::swap(intVec[nLow], intVec[nHigh]);
      std::swap(matrix[nLow], matrix[nHigh]);
      --nHigh;
    }   
  }
  std::swap(intVec[low], intVec[nHigh]);
  std::swap(matrix[low], matrix[nHigh]);

  iqsort(intVec, matrix, low, nHigh);
  iqsort(intVec, matrix, nLow, high);
}

int main() {
  std::vector<int> intVec = {10, 1, 5}; 
  std::vector<std::vector<double>> matrix = {{33.0}, {11.0}, {44.0}};  
  iqsort(intVec, matrix, 0, intVec.size());
  // intVec is {1, 5, 10} and matrix is {{11.0}, {44.0}, {33.0}}
}

I'm pretty sure that --- when you use some up to date compiler (let's say gcc 4.4 and up) --- there nothing really copied: nowadays objects within C++ standard library containers are (mostly) always moved. Therefore IMHO there is no need to have any fear about expensive copies.

Have a look at the following example - it was written using gcc 4.4.6 under Debian. As you can see, during the 'reordering' phase, there is no call to the copy constructor and also no call to the `operator=(... & other)'.

#include <vector>
#include <iostream>
#include <iomanip>

class VeryExpensiveToCopy {
public:
   explicit VeryExpensiveToCopy(long i) : id(i) { ++cnt_normal_cstr; }

   // Just to be sure this is never used.
   VeryExpensiveToCopy & operator=(VeryExpensiveToCopy & other) = delete;
   VeryExpensiveToCopy(VeryExpensiveToCopy & other) = delete;

   VeryExpensiveToCopy(VeryExpensiveToCopy && other) : id(other.id) {
      ++cnt_move_cstr;
   }
   VeryExpensiveToCopy & operator=(VeryExpensiveToCopy && other) {
      id = other.id; ++cnt_op_as_move; return *this;
   }

   long get_id() const { return id; }

   static void print_stats(std::string const & lref) {
      std::cout << "[" << std::setw(20) << lref << "] Normal Cstr [" 
                << cnt_normal_cstr 
                << "] Move Cstr [" << cnt_move_cstr
                << "] operator=(&&) [" << cnt_op_as_move << "]" << std::endl;
   }

private:
   long id;

   static long cnt_normal_cstr;
   static long cnt_move_cstr;
   static long cnt_op_as_move;
};

// Counts the number of calls.
long VeryExpensiveToCopy::cnt_normal_cstr { 0 };
long VeryExpensiveToCopy::cnt_move_cstr { 0 };
long VeryExpensiveToCopy::cnt_op_as_move { 0 };

int main() {
   std::vector<VeryExpensiveToCopy> v;

   VeryExpensiveToCopy::print_stats("Start");
   for(auto i(0); i<100000; ++i) {
      v.emplace_back(i);
   }
   VeryExpensiveToCopy::print_stats("After initialization");
   for(auto i(0); i<100000-1; ++i) {
      v[i] = std::move(v[i+1]);
   }
   VeryExpensiveToCopy::print_stats("After moving");
   for(auto i(0); i<100000-1; ++i) {
      if(v[i].get_id() != i+1) { abort(); }
   }
   VeryExpensiveToCopy::print_stats("After check");

   return 0;
}

Output:

[               Start] Normal Cstr [0] Move Cstr [0] operator=(&&) [0]
[After initialization] Normal Cstr [100000] Move Cstr [131071] operator=(&&) [0]
[        After moving] Normal Cstr [100000] Move Cstr [131071] operator=(&&) [99999]
[         After check] Normal Cstr [100000] Move Cstr [131071] operator=(&&) [99999]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM