将顺序代码转换为openMP并行构造

Question

I have the following piece of code that I would like to write in openmp. 我有以下要在openmp中编写的代码。

My code abstractly looks like the following 我的代码摘要如下所示

I start first with dividing N=100 iterations equally among p=10 pieces and I store the allocated iterations for every piece in a vector 首先，我将N=100次迭代平均分配给p=10片段，然后将分配给每个片段的迭代存储在向量中

Nvec[1]={0,1,..,9}
Nvec[2]={10,11,..,19}
Nvec[p]={N-9,..,N}

then I loop on the iterations 然后我循环迭代

for(k=0;k<p;k++){\\loop on each piece of Nvec
    for(j=0;j<2;j++){\\here is a nested loop
        for(i=Nvec[k][0];i<Nvec[k][p];i++){
            \\then I loop between the first and 
            \\last value of the array corresponding to piece k
    }
}

Now, as you can see the code is sequential with a total of 2*100=200 iterations , I wanted to parallelize it using OpenMp with the absolute condition to keep the order of iterations! 现在，如您所见，代码是顺序的，总共进行了2*100=200 iterations ，我想使用带有绝对条件的OpenMp将其并行化，以保持迭代的顺序！

I tried the following 我尝试了以下

#pragma omp parallel for schedule(static) collapse(2)
{
for(j=0;j<2;j++){
    for(i=0;i<n;i++){
        \\loop code here
    }
}
}

this setting doesn't keep the order of the iterations as in the sequential version. 此设置不会像顺序版本中那样保留迭代顺序。 In the sequential version, each chunk is processed entirely with j=0 and then entirely with j=1 . 在顺序版本中，每个块完全用j=0 ，然后完全用j=1 。

In my openMP version, every thread takes a chunk of iterations and process it entirely with j=0 . 在我的openMP版本中，每个线程都进行大量迭代，并完全用j=0 。 In a way all threads treats either j=0 or j=1 cases. 以某种方式，所有线程都处理j=0或j=1情况。 Every worker with p=10 processes 200/10=20 iterations , problem is all iterations are j=0 or j=1 . 每个p=10工人都处理200/10=20 iterations ，问题是所有迭代都是j=0 or j=1 。

How can I make sure that every thread get a chunk of iterations, performs the loop code with j=0 on all the iterations, then j=1 on the same chunk of iterations? 我如何确保每个线程都获得一个迭代块，在所有迭代中执行j=0的循环代码，然后在同一迭代块中执行j=1 ？

EDIT 编辑

what I want exactly for every chunk of 20 iterations 我想要的20次迭代的每一块到底是什么

worker 1
j:0
i:1--->10
j:1
i:1--->10
worker p
j:0
i:90--->99
j:1
i:90--->99

the openMP code above does 上面的openMP代码可以

worker 1
j:0
i:1--->20
worker p
j:1
i:80--->99

Answer 1

It's actually simple - just make the outer j -loop non-worksharing: 实际上很简单-只需将外部j -loop设为非工作共享即可：

#pragma omp parallel
for (int j = 0; j < 2; j++) {
    #pragma omp for schedule(static)
    for (int i = 0; i < 10; i++) {
         ...
    }
}

If you use the static schedule, OpenMP guarantees, that each worker will get to handle the same range of i s for both j=0 and j=1 . 如果使用static计划，则OpenMP保证，对于j=0和j=1 ，每个工作人员都将处理相同的i s范围。

Note: You moving the parallel construct to the outer loop is merely an optimization to avoid thread management overhead. 注意：将parallel构造移动到外部循环仅仅是为了避免线程管理开销而进行的优化。 The code works similarly if you just place a parallel for in-between the two loops. 如果仅在两个循环之间放置一个parallel for则代码的工作原理类似。

将顺序代码转换为openMP并行构造

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-05-16 17:41:36

将顺序代码转换为openMP并行构造

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-05-16 17:41:36

解决方案1
1 已采纳 2017-05-16 17:41:36