简体   繁体   中英

OpenMP and C++: private variables

I am quite new with OpenMP and c++ and perhaps because of this I am having some really basic problems.

I am trying to do a static schedule with all variables being private (just in case, in order to verify that the result obtained is the same as the non-parallel one).

The problem arises when I see variables such as bodies which I do not know where they came from, as they are not previously defined.

Is it possible to define all the appearing variables, such as bodies , as private? How could that be done

  std::vector<phys_vector> forces(bodies.size());

  size_t i, j; double dist, f, alpha;


  #pragma omp parallel for schedule(static) private(i, j, dist, f, alpha)
  for (i=0; i<bodies.size(); ++i) {
    for (j = i+1; j<bodies.size(); ++j) {
      dist = distance(bodies[i], bodies[j]);
      if (dist > param.min_distance()) {
        f = attraction(bodies[i], bodies[j], param.gravity(), dist);
        alpha = angle(bodies[i],bodies[j]);
        phys_vector deltaf{ f * cos(alpha) , f * sin(alpha) };
        forces[i] += deltaf;
        forces[j] -= deltaf;
      }
    }
  }
  return forces;
}

PS: with the current code, the execution result varies from the non-parallel execution.

It should be reiterated that your bodies variable does not just randomly appear out of nowhere; you should find out exactly where it is declared and what it is defined as. However, because you are only accessing elements of bodies and never changing them, this variable should be shared anyway, so is not your problem.

Your actual problem comes from the forces variable. You must ensure that different threads are not changing the variables forces[j] for the same j . If you follow the logic of your loop, you can be ensured that forces[i] is only accessed by the different threads, so there is no contention between them there. But forces[j] for the same j can very easily be modified by different iterations of your parallel i loop. What you need to do is reduce on your array by following one of the answers from that StackOverflow link.

NoseKnowsAll has correctly identified your problem.

I would like to explain more about why this problem happened. You could have done this with a square loop like this:

#pragma omp parallel for
for(int i=0; i<n; i++) {
    if(i==j) continue;
    phys_vector sum = 0;
    for(int j=0; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
    }
    forces[i] = sum;
}

which uses n*(n-1) iterations and is easy to parallelize.

But since force(i,j) = -force(j,i) we can do this in half the iterations, n*(n-1)/2 , using a triangular loop (which is what you have done):

for(int i=0; i<n; i++) {
    phys_vector sum = 0;
    for(int j=i+1; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
        forces[j] -= deltaf;
    }
    forces[i] = sum;
}

The problem is when you do this optimization it makes it more difficult to parallelize the outer loop. There are two issues: writing to forces[j] and the iterations are no longer well distributed ie the first thread runs over more iterations than the last thread.

The easy solution is to parellelize the inner loop

#pragma omp parallel
for(int i=0; i<n; i++) {
    phys_vector sum = 0;
    #pragma omp for
    for(int j=i+1; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
        forces[j] -= deltaf;
    }
    #pragma omp critical
    forces[i] += sum;
}

This uses n*nthreads critical operations out of a total of n*(n-1)/2 iterations. So the cost of the critical operations gets smaller as n gets larger. You could use a private forces vector for each thread and merge them in a critical section but I don't think this is necessary since the critical operations are on the outer loop and not the inner loop.


Here is a solution which fuses the triangular loop allowing each thread to run over the same number of iterations.

unsigned n = bodies.size();
unsigned r = n*(n-1)/2;
#pragma omp parallel
{
    std::vector<phys_vector> forces_local(bodies.size());
    #pragma omp for schedule(static)
    for(unsigned k=0; k<r; k++) {
        unsigned i  = (1 + sqrt(1.0+8.0*k))/2;
        unsigned j = i - k*(k-1)/2;
        //calculate deltaf
        forces_local[i] += deltaf;
        forces_local[j] -= deltaf;
    }
    #pragma omp critical
    for(unsigned i=0; i<n; i++) forces[i] += forcs_local[i];
}

I was unhappy with my previous method for fusing a triangle (because it needs to use floating point and the sqrt function) so I came up with a much simpler solution based on this answer .

This maps a triangle to a rectangle and visa-versa. First I convert to a rectangle with width n but with n*(n-1)/2 (same as the triangle). Then I calculate the (row,column) values of the rectangle and then to map to a triangle (which skips the diagonal) I using the following formula

//i is the row, j is the column of the rectangle
if(j<=i) {
    i = n - i - 2;
    j = n - j - 1;
}

Let's choose an example. Consider the following n=5 triangular loop pairs

(0,1), (0,2), (0,3), (0,4)
       (1,2), (1,3), (1,4)
              (2,3), (2,4)
                     (3,4)

mapping this to a rectangle becomes

(3,4), (0,1), (0,2), (0,3), (0,4)
(2,4), (2,3), (1,2), (1,3), (1,4)

Triangle loops with even values work the same way though it might not be as obvious. For example for n = 4 .

(0,1), (0,2), (0,3)
       (1,2), (1,3)
              (2,3)

this becomes

(2,3), (0,1), (0,2), (0,3)
(1,2), (1,3)

This is not exactly a rectangle but the mapping works the same. I could have instead mapped it as

 (0,1), (0,2), (0,3)
 (2,3), (1,2), (1,3)

which is a rectangle but then I would need two formulas for odd and even triangle sizes.

Here is the new codes using the rectangle to triangle mapping.

unsigned n = bodies.size();
#pragma omp parallel
{
    std::vector<phys_vector> forces_local(bodies.size());
    #pragma omp for schedule(static)
    for(unsigned k=0; k<n*(n-1)/2; k++) {
        unsigned i = k/n;
        unsigned j = k%n;
        if(j<=i) {
            i = n - i - 2;
            j = n - j - 1;
        }
        //calculate deltaf
        forces_local[i] += deltaf;
        forces_local[j] -= deltaf;
    }
    #pragma omp critical
    for(unsigned i=0; i<n; i++) forces[i] += forcs_local[i];
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM