I have a distance matrix
D
of sizen
byn
and a constantL
as input. I need to create a vectorv
contains all entries inD
such that its value is at mostL
. Herev
must be in a specific orderv = [v1 v2.. vn]
wherevi
contains entries ini
th row ofD
with value at mostL
. The order of entries in eachvi
is not important.
I wonder there is a fast way to create v
using vector, array or any data structure + parallization. What I did is to use for loops and it is very slow for large n
.
vector<int> v;
for (int i=0; i < n; ++i){
for (int j=0; j < n; ++j){
if (D(i,j) <= L) v.push_back(j);
}
}
In consideration of the comments, I made the appropriate corrections (in emphasis).
Have you searched tips for writing performance code, threading, asm instructions (if your assembly is not exactly what you want) and OpenCL for parallel-processing? If not, I strongly recommend!
In some cases , declaring all for loop variables out of the for loop (to avoid declaring they a lot of times) will make it faster, but not in this case (comment from our friend Paddy) .
Also, using new
insted of vector
can be faster, as we see here: Using arrays or std::vectors in C++, what's the performance gap? - and I tested, and with vector
it's 6 seconds slower than with new
,which only takes 1 second. I guess that the safety and ease of management guarantees that come with std::vector is not desired when someone is searching for performance, even because using new
is not so difficult, just avoid heap overflow with calculations and remember using delete[]
user4581301 is correct here, and the following statement is untrue: Finally, if you build D
in a array instead of matrix (or maybe if you copy D
into a constant array, maybe...), it will be much mor cache-friendly and will save one for loop statement.
The best way is mostly depending on the context. If you are seeking for GPU parallization you should take a look at OpenCL.
For CPU based parallization the C++ standard #include <thread>
library is probably your best bet, but you need to be careful:
v may be a 2d vector with vi as its subvectors, but these have to be initialized before the parallization:
std::vector<std::vector<int>> v;
v.reserve(n);
for(size_t i = 0; i < n; i++)
{
v.push_back(std::vector<int>());
}
You need to decide how many threads you want to use. If this is for one machine only, hardcoding is a valid option. There is a function in the thread library that gets the amount of supported threads, but it is more of a hint than trustworthy.
size_t threadAmount = std::thread::hardware_concurrency(); //How many threads should run hardware_concurrency() gives you a hint, but its not optimal
std::vector<std::thread> t; //to store the threads in
t.reserve(threadAmount-1); //you need threadAmount-1 extra threads (we already have the main-thread)
To start a thread you need a function it can execute. In this case this is to read through part of your matrix.
void CheckPart(size_t start, size_t amount, int L, std::vector<std::vector<int>>& vec)
{
for(size_t i = start; i < amount+start; i++)
{
for(size_t j = 0; j < n; j++)
{
if(D(i,j) <= L)
{
vec[i].push_back(j);
}
}
}
}
Now you need to split your matrix in parts of about n/threadAmount rows and start the threads. The thread constructor needs a function and its parameter, but it will always try to copy the parameters, even if the function wants a reference. To prevent this, you need to force using a reference with std::ref()
int i = 0;
int rows;
for(size_t a = 0; a < threadAmount-1; a++)
{
rows = n/threadAmount + ((n%threadAmount>a)?1:0);
t.push_back(std::thread(CheckPart, i, rows, L, std::ref(v)));
i += rows;
}
The threads are now running and all there is to do is run the last block on the main function:
SortPart(i, n/threadAmount, L, v);
After that you need to wait for the threads finishing and clean them up:
for(unsigned int a = 0; a < threadAmount-1; a++)
{
if(t[a].joinable())
{
t[a].join();
}
}
Please note that this is just a quick and dirty example. Different problems might need different implementation, and since I can't guess the context the help I can give is rather limited.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.