简体   繁体   English

将顺序代码转换为openMP并行构造

[英]Translating sequential code into openMP parallel construct

I have the following piece of code that I would like to write in openmp. 我有以下要在openmp中编写的代码。

My code abstractly looks like the following 我的代码摘要如下所示

I start first with dividing N=100 iterations equally among p=10 pieces and I store the allocated iterations for every piece in a vector 首先,我将N=100次迭代平均分配给p=10片段,然后将分配给每个片段的迭代存储在向量中

Nvec[1]={0,1,..,9}
Nvec[2]={10,11,..,19}
Nvec[p]={N-9,..,N}

then I loop on the iterations 然后我循环迭代

for(k=0;k<p;k++){\\loop on each piece of Nvec
    for(j=0;j<2;j++){\\here is a nested loop
        for(i=Nvec[k][0];i<Nvec[k][p];i++){
            \\then I loop between the first and 
            \\last value of the array corresponding to piece k
    }
}

Now, as you can see the code is sequential with a total of 2*100=200 iterations , I wanted to parallelize it using OpenMp with the absolute condition to keep the order of iterations! 现在,如您所见,代码是顺序的,总共进行了2*100=200 iterations ,我想使用带有绝对条件的OpenMp将其并行化,以保持迭代的顺序!

I tried the following 我尝试了以下

#pragma omp parallel for schedule(static) collapse(2)
{
for(j=0;j<2;j++){
    for(i=0;i<n;i++){
        \\loop code here
    }
}
}

this setting doesn't keep the order of the iterations as in the sequential version. 此设置不会像顺序版本中那样保留迭代顺序。 In the sequential version, each chunk is processed entirely with j=0 and then entirely with j=1 . 在顺序版本中,每个块完全用j=0 ,然后完全用j=1

In my openMP version, every thread takes a chunk of iterations and process it entirely with j=0 . 在我的openMP版本中,每个线程都进行大量迭代,并完全用j=0 In a way all threads treats either j=0 or j=1 cases. 以某种方式,所有线程都处理j=0j=1情况。 Every worker with p=10 processes 200/10=20 iterations , problem is all iterations are j=0 or j=1 . 每个p=10工人都处理200/10=20 iterations ,问题是所有迭代都是j=0 or j=1

How can I make sure that every thread get a chunk of iterations, performs the loop code with j=0 on all the iterations, then j=1 on the same chunk of iterations? 我如何确保每个线程都获得一个迭代块,在所有迭代中执行j=0的循环代码,然后在同一迭代块中执行j=1

EDIT 编辑

what I want exactly for every chunk of 20 iterations 我想要的20次迭代的每一块到底是什么

worker 1
j:0
i:1--->10
j:1
i:1--->10
worker p
j:0
i:90--->99
j:1
i:90--->99

the openMP code above does 上面的openMP代码可以

worker 1
j:0
i:1--->20
worker p
j:1
i:80--->99

It's actually simple - just make the outer j -loop non-worksharing: 实际上很简单-只需将外部j -loop设为非工作共享即可:

#pragma omp parallel
for (int j = 0; j < 2; j++) {
    #pragma omp for schedule(static)
    for (int i = 0; i < 10; i++) {
         ...
    }
}

If you use the static schedule, OpenMP guarantees, that each worker will get to handle the same range of i s for both j=0 and j=1 . 如果使用static计划,则OpenMP保证,对于j=0j=1 ,每个工作人员都将处理相同的i s范围。

Note: You moving the parallel construct to the outer loop is merely an optimization to avoid thread management overhead. 注意:将parallel构造移动到外部循环仅仅是为了避免线程管理开销而进行的优化。 The code works similarly if you just place a parallel for in-between the two loops. 如果仅在两个循环之间放置一个parallel for则代码的工作原理类似。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM