简体   繁体   English

OpenMP和C ++:私有变量

[英]OpenMP and C++: private variables

I am quite new with OpenMP and c++ and perhaps because of this I am having some really basic problems. 我对OpenMP和c ++相当陌生,也许正因为如此,我遇到了一些非常基本的问题。

I am trying to do a static schedule with all variables being private (just in case, in order to verify that the result obtained is the same as the non-parallel one). 我正在尝试将所有变量都设为私有的静态计划(以防万一,以验证所获得的结果与非并行结果相同)。

The problem arises when I see variables such as bodies which I do not know where they came from, as they are not previously defined. 当我看到诸如bodies变量时,就会出现问题,因为它们不知道它们来自何处,因为它们以前没有定义。

Is it possible to define all the appearing variables, such as bodies , as private? 是否可以将所有出现的变量(例如bodies )定义为private? How could that be done 那怎么办

  std::vector<phys_vector> forces(bodies.size());

  size_t i, j; double dist, f, alpha;


  #pragma omp parallel for schedule(static) private(i, j, dist, f, alpha)
  for (i=0; i<bodies.size(); ++i) {
    for (j = i+1; j<bodies.size(); ++j) {
      dist = distance(bodies[i], bodies[j]);
      if (dist > param.min_distance()) {
        f = attraction(bodies[i], bodies[j], param.gravity(), dist);
        alpha = angle(bodies[i],bodies[j]);
        phys_vector deltaf{ f * cos(alpha) , f * sin(alpha) };
        forces[i] += deltaf;
        forces[j] -= deltaf;
      }
    }
  }
  return forces;
}

PS: with the current code, the execution result varies from the non-parallel execution. PS:使用当前代码,执行结果与非并行执行不同。

It should be reiterated that your bodies variable does not just randomly appear out of nowhere; 应该重申的是,您的bodies变量不仅会随机出现。 you should find out exactly where it is declared and what it is defined as. 您应该确切地知道它的声明位置和定义。 However, because you are only accessing elements of bodies and never changing them, this variable should be shared anyway, so is not your problem. 但是,因为你只是访问的元素bodies ,从来没有改变他们,这个变量应shared无论如何,所以是不是你的问题。

Your actual problem comes from the forces variable. 您的实际问题来自于forces变量。 You must ensure that different threads are not changing the variables forces[j] for the same j . 您必须确保不同的线程不会更改相同j的变量forces[j] If you follow the logic of your loop, you can be ensured that forces[i] is only accessed by the different threads, so there is no contention between them there. 如果遵循循环的逻辑,则可以确保仅由不同的线程访问forces[i] ,因此它们之间没有争用。 But forces[j] for the same j can very easily be modified by different iterations of your parallel i loop. 但是,可以通过并行i循环的不同迭代很容易地修改相同j forces[j] What you need to do is reduce on your array by following one of the answers from that StackOverflow link. 您需要做的是通过遵循该StackOverflow链接中的答案之一来减少阵列的数量。

NoseKnowsAll has correctly identified your problem. NoseKnowsAll已正确识别您的问题。

I would like to explain more about why this problem happened. 我想详细解释为什么发生此问题。 You could have done this with a square loop like this: 您可以使用像这样的方形循环来完成此操作:

#pragma omp parallel for
for(int i=0; i<n; i++) {
    if(i==j) continue;
    phys_vector sum = 0;
    for(int j=0; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
    }
    forces[i] = sum;
}

which uses n*(n-1) iterations and is easy to parallelize. 它使用n*(n-1)迭代,并且易于并行化。

But since force(i,j) = -force(j,i) we can do this in half the iterations, n*(n-1)/2 , using a triangular loop (which is what you have done): 但是由于force(i,j) = -force(j,i)我们可以使用三角形循环(这是您所做的)在一半的迭代n*(n-1)/2完成此操作:

for(int i=0; i<n; i++) {
    phys_vector sum = 0;
    for(int j=i+1; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
        forces[j] -= deltaf;
    }
    forces[i] = sum;
}

The problem is when you do this optimization it makes it more difficult to parallelize the outer loop. 问题是,当您执行此优化时,它会使并行化外部循环更加困难。 There are two issues: writing to forces[j] and the iterations are no longer well distributed ie the first thread runs over more iterations than the last thread. 存在两个问题:写入forces[j] ,并且迭代不再均匀分布,即第一个线程比最后一个线程运行更多的迭代。

The easy solution is to parellelize the inner loop 简单的解决方案是使内循环平行

#pragma omp parallel
for(int i=0; i<n; i++) {
    phys_vector sum = 0;
    #pragma omp for
    for(int j=i+1; j<n; j++) {
        //calculate deltaf
        sum += deltaf;
        forces[j] -= deltaf;
    }
    #pragma omp critical
    forces[i] += sum;
}

This uses n*nthreads critical operations out of a total of n*(n-1)/2 iterations. 这在总共n*(n-1)/2次迭代中使用了n*nthreads关键操作。 So the cost of the critical operations gets smaller as n gets larger. 因此,随着n变大,关键操作的成本变小。 You could use a private forces vector for each thread and merge them in a critical section but I don't think this is necessary since the critical operations are on the outer loop and not the inner loop. 您可以对每个线程使用私有forces向量,并将它们合并到关键部分,但是我认为这不是必需的,因为关键操作位于外部循环而不是内部循环。


Here is a solution which fuses the triangular loop allowing each thread to run over the same number of iterations. 这是融合三角形循环的解决方案,允许每个线程运行相同的迭代次数。

unsigned n = bodies.size();
unsigned r = n*(n-1)/2;
#pragma omp parallel
{
    std::vector<phys_vector> forces_local(bodies.size());
    #pragma omp for schedule(static)
    for(unsigned k=0; k<r; k++) {
        unsigned i  = (1 + sqrt(1.0+8.0*k))/2;
        unsigned j = i - k*(k-1)/2;
        //calculate deltaf
        forces_local[i] += deltaf;
        forces_local[j] -= deltaf;
    }
    #pragma omp critical
    for(unsigned i=0; i<n; i++) forces[i] += forcs_local[i];
}

I was unhappy with my previous method for fusing a triangle (because it needs to use floating point and the sqrt function) so I came up with a much simpler solution based on this answer . 我对以前的三角形融合方法不满意(因为它需要使用浮点和sqrt函数),因此我根据此答案提出了一个更简单的解决方案。

This maps a triangle to a rectangle and visa-versa. 这会将三角形映射为矩形,反之亦然。 First I convert to a rectangle with width n but with n*(n-1)/2 (same as the triangle). 首先,我将其转换为宽度为n但宽度为n*(n-1)/2 (与三角形相同)的矩形。 Then I calculate the (row,column) values of the rectangle and then to map to a triangle (which skips the diagonal) I using the following formula 然后,我计算矩形的(行,列)值,然后使用以下公式映射到三角形(跳过对角线)

//i is the row, j is the column of the rectangle
if(j<=i) {
    i = n - i - 2;
    j = n - j - 1;
}

Let's choose an example. 让我们选择一个例子。 Consider the following n=5 triangular loop pairs 考虑以下n=5三角形环对

(0,1), (0,2), (0,3), (0,4)
       (1,2), (1,3), (1,4)
              (2,3), (2,4)
                     (3,4)

mapping this to a rectangle becomes 将此映射到矩形成为

(3,4), (0,1), (0,2), (0,3), (0,4)
(2,4), (2,3), (1,2), (1,3), (1,4)

Triangle loops with even values work the same way though it might not be as obvious. 具有偶数值的三角形循环的工作方式相同,尽管可能并不那么明显。 For example for n = 4 . 例如, n = 4

(0,1), (0,2), (0,3)
       (1,2), (1,3)
              (2,3)

this becomes 这变成

(2,3), (0,1), (0,2), (0,3)
(1,2), (1,3)

This is not exactly a rectangle but the mapping works the same. 这并非完全是矩形,但映射的作用相同。 I could have instead mapped it as 我本来可以将其映射为

 (0,1), (0,2), (0,3)
 (2,3), (1,2), (1,3)

which is a rectangle but then I would need two formulas for odd and even triangle sizes. 这是一个矩形,但随后我需要两个公式来计算奇数和偶数三角形的大小。

Here is the new codes using the rectangle to triangle mapping. 这是使用矩形到三角形映射的新代码。

unsigned n = bodies.size();
#pragma omp parallel
{
    std::vector<phys_vector> forces_local(bodies.size());
    #pragma omp for schedule(static)
    for(unsigned k=0; k<n*(n-1)/2; k++) {
        unsigned i = k/n;
        unsigned j = k%n;
        if(j<=i) {
            i = n - i - 2;
            j = n - j - 1;
        }
        //calculate deltaf
        forces_local[i] += deltaf;
        forces_local[j] -= deltaf;
    }
    #pragma omp critical
    for(unsigned i=0; i<n; i++) forces[i] += forcs_local[i];
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM