[英]Translating sequential code into openMP parallel construct
I have the following piece of code that I would like to write in openmp. 我有以下要在openmp中编写的代码。
My code abstractly looks like the following 我的代码摘要如下所示
I start first with dividing N=100
iterations equally among p=10
pieces and I store the allocated iterations for every piece in a vector 首先,我将
N=100
次迭代平均分配给p=10
片段,然后将分配给每个片段的迭代存储在向量中
Nvec[1]={0,1,..,9}
Nvec[2]={10,11,..,19}
Nvec[p]={N-9,..,N}
then I loop on the iterations 然后我循环迭代
for(k=0;k<p;k++){\\loop on each piece of Nvec
for(j=0;j<2;j++){\\here is a nested loop
for(i=Nvec[k][0];i<Nvec[k][p];i++){
\\then I loop between the first and
\\last value of the array corresponding to piece k
}
}
Now, as you can see the code is sequential with a total of 2*100=200 iterations
, I wanted to parallelize it using OpenMp with the absolute condition to keep the order of iterations! 现在,如您所见,代码是顺序的,总共进行了
2*100=200 iterations
,我想使用带有绝对条件的OpenMp将其并行化,以保持迭代的顺序!
I tried the following 我尝试了以下
#pragma omp parallel for schedule(static) collapse(2)
{
for(j=0;j<2;j++){
for(i=0;i<n;i++){
\\loop code here
}
}
}
this setting doesn't keep the order of the iterations as in the sequential version. 此设置不会像顺序版本中那样保留迭代顺序。 In the sequential version, each chunk is processed entirely with
j=0
and then entirely with j=1
. 在顺序版本中,每个块完全用
j=0
,然后完全用j=1
。
In my openMP version, every thread takes a chunk of iterations and process it entirely with j=0
. 在我的openMP版本中,每个线程都进行大量迭代,并完全用
j=0
。 In a way all threads treats either j=0
or j=1
cases. 以某种方式,所有线程都处理
j=0
或j=1
情况。 Every worker with p=10
processes 200/10=20 iterations
, problem is all iterations are j=0 or j=1
. 每个
p=10
工人都处理200/10=20 iterations
,问题是所有迭代都是j=0 or j=1
。
How can I make sure that every thread get a chunk of iterations, performs the loop code with j=0
on all the iterations, then j=1
on the same chunk of iterations? 我如何确保每个线程都获得一个迭代块,在所有迭代中执行
j=0
的循环代码,然后在同一迭代块中执行j=1
?
EDIT 编辑
what I want exactly for every chunk of 20 iterations 我想要的20次迭代的每一块到底是什么
worker 1
j:0
i:1--->10
j:1
i:1--->10
worker p
j:0
i:90--->99
j:1
i:90--->99
the openMP code above does 上面的openMP代码可以
worker 1
j:0
i:1--->20
worker p
j:1
i:80--->99
It's actually simple - just make the outer j
-loop non-worksharing: 实际上很简单-只需将外部
j
-loop设为非工作共享即可:
#pragma omp parallel
for (int j = 0; j < 2; j++) {
#pragma omp for schedule(static)
for (int i = 0; i < 10; i++) {
...
}
}
If you use the static
schedule, OpenMP guarantees, that each worker will get to handle the same range of i
s for both j=0
and j=1
. 如果使用
static
计划,则OpenMP保证,对于j=0
和j=1
,每个工作人员都将处理相同的i
s范围。
Note: You moving the parallel
construct to the outer loop is merely an optimization to avoid thread management overhead. 注意:将
parallel
构造移动到外部循环仅仅是为了避免线程管理开销而进行的优化。 The code works similarly if you just place a parallel for
in-between the two loops. 如果仅在两个循环之间放置一个
parallel for
则代码的工作原理类似。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.