I'm trying to filter a vector into another vector in parallel. My current setup generates too much overhead so it's even slower than serial. Concretely:
#pragma omp parallel for collapse(2)
for(int i = 0 ; i < a.size() ; i++){
for(int j = 0 ; j < a.size() ; j++){
if(meetsConditions(a[i], a[j])){
std::vector<int> tmp = {i, j};
#pragma omp critical
b.push_back(tmp);
}
}
}
I'm saving the indices as I would like to later run a separate serial function on each couple that meets the condition:
for(auto element : b){
doSmth(a[element[0]], a[element[1]]);
}
I tried doing it with a new empty vector, resizing it to a.size()*a.size(), allocating the elements using a third index that's in an atomic clause when it gets incremented, but that caused a data race (unless I saw it wrongly). How could I go about tackling this problem? Could maybe using lists make it easier? Or maybe storing pointers to these elements directly would make it easier? I'm really new to C++ so I'm not quite sure how I could get that to work.
Assuming a.size()
is large enough, I would parallelize only the first loop (no collapse), use a local b_local
to each thread, and at the end concatenate all the b_local
to the shared b
.
#pragma omp parallel
{
std::vector<std::vector<int>> b_local;
#pragma omp for
for (int i = 0 ; i < a.size() ; i++){
for(int j = 0 ; j < a.size() ; j++){
if(meetsConditions(a[i], a[j])){
std::vector<int> tmp = {i, j};
b_local.push_back(tmp);
}
}
}
#pragma omp critical
b.insert(b.end(),b_local.begin(),b_local.end());
}
This should be more efficient, as the critical
section is now outside of the loops and encountered only once per thread. b_local
being private to each threads, no critical
is needed when updating it.
But actually I'm not sure it's worth parallelizing this, unless meetsConditions() has many computations.
I would expect this code to run slower, as you have aa critcal section that is going to force the b.push_back(tmp) to run in a single thread.
Perhaps look to remove the critcal section and instead write the results directly to an already appropriatly sized b, something like:
vector<std::vector<int>> b;
b.resize(a.size() * a.size());
#pragma omp parallel for collapse(2)
for (int i = 0; i < a.size(); i++) {
for (int j = 0; j < a.size(); j++) {
if (meetsConditions(a[i], a[j])) {
std::vector<int> tmp = { i, j };
b.at((i*a.size())+j) = tmp;
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.