简体   繁体   中英

Redundant computations in attempt to parallelize recursive function with OpenMP

I have a recursive function that calls itself twice. My attempt to parallelize the function works eventually, but does a lot of redundant computations in the interim, thus wiping all gains from parallelism.

The main program is trying to compute an auxiliary graph, which is an intermediate data structure required in computing all k-edge connected components of a graph.

I've been having a go at this problem for months now and I only decided to ask for help here as a last resort. I will appreciate any comments or suggestions pointing me in the right direction; I'm not necessarily looking for a solution on a plate.

I tried using the #pragma omp single nowait, but that only resulted in sequential execution of the code.

I tried using cilk_spawn one other time but that only resulted in my computer running out of memory. I guess too many processes were spawned.

I extracted the spirit of the problem into a minimum working example that I paste below.

The code posted below repeats each computation about eight times. I guess eight different processes run a separate copy of the program instead of working on parts of the problem simultaneously.

#include <iostream>
#include <omp.h>
#include <numeric>
#include <vector>
#include <random>
#include <algorithm>
using namespace std;

int foo(std::vector<int> V, int s){
    int n = V.size();

    if (n>1){
    std::cout<<n<<" ";
    std::random_device rd; // obtain a random number from hardware
    std::mt19937 eng(rd()); // seed the generator
    std::uniform_int_distribution<int> distr(0, n-1); // define the range
    int t = 1;

    auto first = V.begin();
    auto mid = V.begin() + (t);
    auto mid_1 = V.begin() + (t);

    std::vector<int> S(first, mid);
    std::vector<int> T(mid_1, V.end());

    #pragma omp parallel
    {
    #pragma omp task
    foo(S, s);
    #pragma omp task
    foo(T, t); 
    }
    }
   return 0;
}



int main(){
    std::vector<int> N(100);
    iota(N.begin(), N.end(), 0);
    int p = foo(N,0);
    return (0);
}

My aim is to have all processes/threads work together to complete the recursion.

The correct way to apply task parallelism with OpenMP for your example would be as follows.

int foo(std::vector<int> V, int s)
{
    int n = V.size();

    if (n > 1)
    {
        std::cout << n << " ";
        std::random_device rd;                              // obtain a random number from hardware
        std::mt19937 eng(rd());                             // seed the generator
        std::uniform_int_distribution<int> distr(0, n - 1); // define the range
        int t = 1;

        auto first = V.begin();
        auto mid = V.begin() + (t);
        auto mid_1 = V.begin() + (t);

        std::vector<int> S(first, mid);
        std::vector<int> T(mid_1, V.end());

        #pragma omp task
        foo(S, s);
        #pragma omp task
        foo(T, t);
    }
    return 0;
}

int main()
{
    std::vector<int> N(10000);
    std::iota(N.begin(), N.end(), 0);
    #pragma omp parallel
    #pragma omp single
    {
        int p = foo(N, 0);
    }
    return (0);
}

That said, the particular example won't show a performance improvement because it is very fast on its own and dominated by memory allocation. So if you do not see a benefit in applying this, feel free to update or post a new question with a more specific example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM