Filtering vector onto another vector in Parallel

Question

I'm trying to filter a vector into another vector in parallel. My current setup generates too much overhead so it's even slower than serial. Concretely:

#pragma omp parallel for collapse(2)
for(int i = 0 ; i < a.size() ; i++){
   for(int j = 0 ; j < a.size() ; j++){
      if(meetsConditions(a[i], a[j])){
         std::vector<int> tmp = {i, j};
         #pragma omp critical
         b.push_back(tmp);
      }
   }
}

I'm saving the indices as I would like to later run a separate serial function on each couple that meets the condition:

for(auto element : b){
   doSmth(a[element[0]], a[element[1]]);
}

I tried doing it with a new empty vector, resizing it to a.size()*a.size(), allocating the elements using a third index that's in an atomic clause when it gets incremented, but that caused a data race (unless I saw it wrongly). How could I go about tackling this problem? Could maybe using lists make it easier? Or maybe storing pointers to these elements directly would make it easier? I'm really new to C++ so I'm not quite sure how I could get that to work.

Answer 1

Assuming a.size() is large enough, I would parallelize only the first loop (no collapse), use a local b_local to each thread, and at the end concatenate all the b_local to the shared b .

#pragma omp parallel
{
    std::vector<std::vector<int>> b_local;
    #pragma omp for
    for (int i = 0 ; i < a.size() ; i++){
        for(int j = 0 ; j < a.size() ; j++){
            if(meetsConditions(a[i], a[j])){
                std::vector<int> tmp = {i, j};
                b_local.push_back(tmp);
            }
        }
    }
    #pragma omp critical
    b.insert(b.end(),b_local.begin(),b_local.end()); 
}

This should be more efficient, as the critical section is now outside of the loops and encountered only once per thread. b_local being private to each threads, no critical is needed when updating it.

But actually I'm not sure it's worth parallelizing this, unless meetsConditions() has many computations.

Answer 2

I would expect this code to run slower, as you have aa critcal section that is going to force the b.push_back(tmp) to run in a single thread.

Perhaps look to remove the critcal section and instead write the results directly to an already appropriatly sized b, something like:

vector<std::vector<int>> b;
b.resize(a.size() * a.size());

#pragma omp parallel for collapse(2)
for (int i = 0; i < a.size(); i++) {
    for (int j = 0; j < a.size(); j++) {
        if (meetsConditions(a[i], a[j])) {
            std::vector<int> tmp = { i, j };
            b.at((i*a.size())+j) = tmp;
        }
    }
}

Filtering vector onto another vector in Parallel

Question

2 answers

solution1
2 2022-12-14 12:04:10

solution2
0 2022-12-14 11:35:29

Filtering vector onto another vector in Parallel

Question

2 answers

solution1 2 2022-12-14 12:04:10

solution2 0 2022-12-14 11:35:29

solution1
2 2022-12-14 12:04:10

solution2
0 2022-12-14 11:35:29