简体   繁体   中英

Sorting one vector with respect to another - most efficient way?

I'm aware that this question has already been asked a few times , but different answers have been provided for simple cases (where compactness, readability or user proficiency are the deciding factors) and I'm not sure which one is the most efficient , as I'm concerned with repeating that operation O(1M) times .

The setup is the following:

  • Two vectors A and B of float 's; this cannot be changed, but additional structures can be created from A and B .
  • A and B have equal length, which is at least 4 and at most 20 (if that's helpful in any way).
  • A needs to be sorted in descending order based on the values of its entries, while B simply needs to match A 's ordering .

Example:

A = {2,4,3,1} -> {4,3,2,1}
     | | | |
B = {1,2,3,4} -> {2,3,1,4}

Question:

What's the most efficient (= fast + memory saving) way of doing this?

One common way is to create an index and sort it, rather than sorting the original values. This is known as indirect sort or argsort .

Example:

using values_t = std::vector<float>;
using index_t = std::vector<uint8_t>;

index_t make_sorted_index(values_t const& values) {
    index_t index(values.size());
    std::iota(index.begin(), index.end(), 0);
    std::sort(index.begin(), index.end(), [&values](uint8_t a, uint8_t b) { return values[a] > values[b]; } );
    return index;
}

int main() {
    values_t a = {2,4,3,1};
    values_t b = {1,2,3,4};

    auto index = make_sorted_index(a);

    std::cout << "A = {";
    for(auto i : index)
        std::cout << a[i] << ',';
    std::cout << "\b}\n";

    std::cout << "B = {";
    for(auto i : index)
        std::cout << b[i] << ',';
    std::cout << "\b}\n";
}

Outputs:

A = {4,3,2,1}
B = {2,3,1,4}

A and B have equal length, which is at least 4 and at most 20 (if that's helpful in any way).

Since you have both of them the same size you can store pointers to values of B in A eliminating O(n) time needed to rearrange B according to A . Method you would like to use is gonna cost you every time you would like to do sth. to A or B .

[...]which one is the most efficient, as I'm concerned with repeating that operation O(1M) times.

What's the most efficient (= fast + memory saving) way of doing this?

So we are looking for linear in-place algorithm for sorting ~20 floats? Hard task.

I would recommend Block Sort for this kind of problem. It's a stable O(nlogn) time complexity and of course O(1) memory use.

Here you have it's implentation in C & C++ named: Wiki Sort . There is also nice comparison vs std::stable_sort() analysing algorithm behaviour with different data ordering.

It's really hard to beat std::pair<float, float> with std::sort in this scenario and this is coming from one who has tried a lot:

Sorting 1,000,000 elements 32 times...

mt_sort: {0.220000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

mt_radix_sort: {0.202000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

std::sort: {1.779000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

qsort: {2.718000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

... and can easily get something faster than std::sort (as well as tbb::sort which still takes over a second) except that's with an input size of 1 mil single-precision floats. Once you start talking about the input sizes you are talking about with 4-20 elements, it becomes extremely hard to beat std::sort . I've tried for an entire day with the most micro-tuned insertion sort spending an entire day on just that with endless vtune sessions only to just end up getting the same performance and giving up, and it wasn't the first time I tried to beat std::sort for teeny input sizes (it's so easy to beat std::sort for large input sizes which keeps tempting me over and over to try to also beat it for small inputs every year or two on a weekend as I improve my assembly and computer architecture knowledge, but seems impossible to me at least given my skills/lack thereof to beat it for teeny inputs). I've also sifted through all kinds of libraries for sorting numbers and they don't beat std::sort for small inputs either or mine for large inputs (I wouldn't be bothering to handroll my own numeric sorts for large inputs if I could just plug one in from elsewhere).

These other suggestions like indirect/algo sort tend to be quite excellent for non-trivial input sizes, but it's really tough to beat std::sort for trivial input sizes (and 4-24 32-bit elements is really trivial if you ask me). Probably the most micro-tuned insertion sort or heap sort or some other kind of quadratic complexity (O(N^2)) sort as your best bet, possibly with some kind of super fancy SIMD implementation or something like that. We shouldn't be thinking about algorithmic complexity at these kinds of teeny scales: mostly just machine instructions, and it might be more productive to think about how to parallelize the sorts and sort multiple teeny sequences at once instead of trying to make each individual sort go faster for such teeny, teeny inputs.

I've always been interested in faster sorts of floating-point numbers since they can improve the build times for certain Kd trees and BVHs used in raytracing and other areas which could save tremendous money for studios (studios like Pixar and ILM pour tons of money just on their render farms), but I've never been able to beat std::sort on input sizes with, say, less than 64 floats (<256 bytes). Again it's easy for me to beat it for thousands of elements or more, but it is already really fast (in ways that should make you content) for teeny inputs.

That said, the memory savings part is easy. Just sort in place ( std::sort would be a start). Don't create any temporary array as needed for other sorts like, say, radix sort. Chances are that this will also be the fastest way to do it in this case for such teeny input sizes.

You might be able to get the tiniest boost using your own pair type:

struct Key
{
    bool operator<(Key lhs, Key rhs) const {return lhs.a < rhs.a;}
    float a, b;
};

... difference with std::pair in this scenario is that it doesn't bother to compare b . I doubt that would help much since the expression would be short-circuited, but maybe the optimizer might be able to do something a little bit more with that if it knows that b is not accessed in the comparator.

You'll definitely get a speed boost if you avoid using std::vector here to store each teeny sequence. It's not efficient to store a million vectors that only contain 4-20 elements each. That would require at least a million heap allocations as well as more memory used than needed for size/capacity/pointer container data. Instead store all 4-20 million elements in one std::vector instance, eg, and sort ranges of it if you need to gather the teeny sequences in advance. If not, use the stack with std::array or just a plain old array of floats with an upper-bound size of 20.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM