Fastest way to create a vector of indices from distance matrix in C++

Question

I have a distance matrix D of size n by n and a constant L as input. I need to create a vector v contains all entries in D such that its value is at most L . Here v must be in a specific order v = [v1 v2.. vn] where vi contains entries in i th row of D with value at most L . The order of entries in each vi is not important.

I wonder there is a fast way to create v using vector, array or any data structure + parallization. What I did is to use for loops and it is very slow for large n .

vector<int> v;
for (int i=0; i < n; ++i){
    for (int j=0; j < n; ++j){
        if (D(i,j) <= L) v.push_back(j);
    }
}

Answer 1

In consideration of the comments, I made the appropriate corrections (in emphasis).

Have you searched tips for writing performance code, threading, asm instructions (if your assembly is not exactly what you want) and OpenCL for parallel-processing? If not, I strongly recommend!

In some cases , declaring all for loop variables out of the for loop (to avoid declaring they a lot of times) will make it faster, but not in this case (comment from our friend Paddy) .

Also, using new insted of vector can be faster, as we see here: Using arrays or std::vectors in C++, what's the performance gap? - and I tested, and with vector it's 6 seconds slower than with new ,which only takes 1 second. I guess that the safety and ease of management guarantees that come with std::vector is not desired when someone is searching for performance, even because using new is not so difficult, just avoid heap overflow with calculations and remember using delete[]

user4581301 is correct here, and the following statement is untrue: Finally, if you build D in a array instead of matrix (or maybe if you copy D into a constant array, maybe...), it will be much mor cache-friendly and will save one for loop statement.

Answer 2

The best way is mostly depending on the context. If you are seeking for GPU parallization you should take a look at OpenCL.

For CPU based parallization the C++ standard #include <thread> library is probably your best bet, but you need to be careful:

Threads take time to create so if n is relatively small (<1000 or so) it will slow you down
D(i,j) has to be readably by multiple threads at the same time
v has to be writable by multiple threads, a standard vector wont cut it

v may be a 2d vector with vi as its subvectors, but these have to be initialized before the parallization:

std::vector<std::vector<int>> v; 
v.reserve(n);                    
for(size_t i = 0; i < n; i++)
{
    v.push_back(std::vector<int>());
}

You need to decide how many threads you want to use. If this is for one machine only, hardcoding is a valid option. There is a function in the thread library that gets the amount of supported threads, but it is more of a hint than trustworthy.

size_t threadAmount = std::thread::hardware_concurrency(); //How many threads should run hardware_concurrency() gives you a hint, but its not optimal
std::vector<std::thread> t;                                //to store the threads in
t.reserve(threadAmount-1);                                 //you need threadAmount-1 extra threads (we already have the main-thread)

To start a thread you need a function it can execute. In this case this is to read through part of your matrix.

void CheckPart(size_t start, size_t amount, int L, std::vector<std::vector<int>>& vec)
{
    for(size_t i = start; i < amount+start; i++)
    {
        for(size_t j = 0; j < n; j++)
        {
            if(D(i,j) <= L)
            {
                vec[i].push_back(j);
            }
        }
    }
}

Now you need to split your matrix in parts of about n/threadAmount rows and start the threads. The thread constructor needs a function and its parameter, but it will always try to copy the parameters, even if the function wants a reference. To prevent this, you need to force using a reference with std::ref()

int i = 0;
int rows;
for(size_t a = 0; a < threadAmount-1; a++)
{
    rows = n/threadAmount + ((n%threadAmount>a)?1:0);
    t.push_back(std::thread(CheckPart, i, rows, L, std::ref(v)));
    i += rows;
}

The threads are now running and all there is to do is run the last block on the main function:

SortPart(i, n/threadAmount, L, v);

After that you need to wait for the threads finishing and clean them up:

for(unsigned int a = 0; a < threadAmount-1; a++)
{
    if(t[a].joinable())
    {
        t[a].join();
    }
}

Please note that this is just a quick and dirty example. Different problems might need different implementation, and since I can't guess the context the help I can give is rather limited.

Fastest way to create a vector of indices from distance matrix in C++

Question

2 answers

solution1
0 2019-10-11 00:07:06

solution2
0 2019-10-13 12:59:41

Fastest way to create a vector of indices from distance matrix in C++

Question

2 answers

solution1 0 2019-10-11 00:07:06

solution2 0 2019-10-13 12:59:41

solution1
0 2019-10-11 00:07:06

solution2
0 2019-10-13 12:59:41