[英]how to use orphaned for loop in OpenMP?
解决:请参见下面的编辑2
我正在尝试并行化对矩阵进行某些运算的算法(为简单起见,我们称其为模糊)。 完成此操作后,它将在旧矩阵与新矩阵之间找到最大的变化(以每个元素为基础,新旧矩阵之间的绝对差的最大值)。 如果此最大差异高于某个阈值,则进行矩阵运算的另一次迭代。
所以我的主程序有以下循环:
converged = 0;
for( i = 1; i <= iteration_limit; i++ ){
max_diff = update( &data_grid );
if( max_diff < tol ) {
converged = 1;
break;
}
}
然后, update( &data_grid )
调用模糊算法的实际实现。 然后模糊算法在矩阵上进行迭代,正是我要并行化的循环:
for( i = 0; i < width; i++ ) {
for( j = 0; j <= height; j++ ) {
g->data[ update ][ i ][ j ] =
ONE_QUARTER * (
g->data[ update ][ i + 1 ][ j ] +
g->data[ update ][ i - 1 ][ j ] +
g->data[ update ][ i ][ j + 1 ] +
g->data[ update ][ i ][ j - 1 ] +
);
diff = fabs( g->data[ old ][ i ][ j ] - g->data[ update ][ i ][ j ] );
maxdiff = maxdiff > diff ? maxdiff : diff;
}
}
我可以在update(&data_grid)
内update(&data_grid)
一个并行区域,但这意味着在每次迭代中都会创建和销毁线程,这是我想避免的:
#pragma omp parallel for private(i, j, diff, maxdg) shared(width, height, update, g, dg, chunksize) default(none) schedule(static, chunksize)
我有2个网格副本,并在每次迭代中通过在0
和1
之间切换old
和update
来将新答案写在“另一个”中。
编辑:
因此,按照乔纳森·杜尔西(Jonathan Dursi)的建议,我为循环创建了一个孤立的omp,但是由于某种原因,似乎无法找到线程之间的最大值...
这是我的“外部”代码:
converged = 0;
#pragma omp parallel shared(i, max_iter, g, tol, maxdg, dg) private(converged) default(none)
{
for( i = 1; i <= 40; i++ ){
maxdg = 0;
dg = grid_update( &g );
printf("[%d] dg from a single thread: %f\n", omp_get_thread_num(), dg );
#pragma omp critical
{
if (dg > maxdg) maxdg = dg;
}
#pragma omp barrier
#pragma omp flush
printf("[%d] maxdg: %f\n", omp_get_thread_num(), maxdg);
if( maxdg < tol ) {
converged = 1;
break;
}
}
}
结果:
[11] dg from a single thread: 0.000000
[3] dg from a single thread: 0.000000
[4] dg from a single thread: 0.000000
[5] dg from a single thread: 0.000000
[0] dg from a single thread: 0.166667
[6] dg from a single thread: 0.000000
[7] dg from a single thread: 0.000000
[8] dg from a single thread: 0.000000
[9] dg from a single thread: 0.000000
[15] dg from a single thread: 0.000000
[10] dg from a single thread: 0.000000
[1] dg from a single thread: 0.166667
[12] dg from a single thread: 0.000000
[13] dg from a single thread: 0.000000
[14] dg from a single thread: 0.000000
[2] maxdg: 0.000000
[3] maxdg: 0.000000
[0] maxdg: 0.000000
[8] maxdg: 0.000000
[9] maxdg: 0.000000
[4] maxdg: 0.000000
[5] maxdg: 0.000000
[6] maxdg: 0.000000
[7] maxdg: 0.000000
[1] maxdg: 0.000000
[14] maxdg: 0.000000
[11] maxdg: 0.000000
[15] maxdg: 0.000000
[10] maxdg: 0.000000
[12] maxdg: 0.000000
[13] maxdg: 0.000000
编辑2:在私有/共享分类器上犯了一些错误,并且忘记了障碍。 这是正确的代码:
#pragma omp parallel shared(max_iter, g, tol, maxdg) private(i, dg, converged) default(none)
{
for( i = 1; i <= max_iter; i++ ){
#pragma omp barrier
maxdg=0;
/*#pragma omp flush */
dg = grid_update( &g );
#pragma omp critical
{
if (dg > maxdg) maxdg = dg;
}
#pragma omp barrier
/*#pragma omp flush*/
if( maxdg < tol ) {
converged = 1;
break;
}
}
}
从for之前的另一个例程开始并行段没有问题,当然是因为OpenMP 3.0(2008),也许是因为OpenMP 2.5。 使用gcc4.4:
external.c:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
void update(int n, int iter);
int main(int argc, char **argv) {
int n=10;
#pragma omp parallel num_threads(4) default(none) shared(n)
for (int iter=0; iter<3; iter++)
{
#pragma omp single
printf("---iteration %d---\n", iter);
update(n, iter);
}
return 0;
}
inner.c:
#include <omp.h>
#include <stdio.h>
void update(int n, int iter) {
int thread = omp_get_thread_num();
#pragma omp for
for (int i=0;i<n;i++) {
int newthread=omp_get_thread_num();
printf("%3d: doing loop index %d.\n",newthread,i);
}
}
建造:
$ make
gcc44 -g -fopenmp -std=c99 -c -o inner.o inner.c
gcc44 -g -fopenmp -std=c99 -c -o outer.o outer.c
gcc44 -o main outer.o inner.o -fopenmp -lgomp
$ ./main
---iteration 0---
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
3: doing loop index 9.
---iteration 1---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
3: doing loop index 9.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
---iteration 2---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
3: doing loop index 9.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
但是,按照@ jdv-Jan de Vaan的说法,如果在最新的OpenMP实现中,如果与并行进行更新相比,导致显着的性能改进,尤其是在更新足够昂贵的情况下,我将感到非常惊讶。
顺便说一句,仅在更新时在Gauss-Seidel例程中为i循环放置并行是有问题的。 您会看到i步不是独立的,这将导致比赛条件。 您将需要执行类似Red-Black或Jacobi迭代的操作...
更新:
提供的代码示例用于GS迭代,而不是Jacobi,但我只是假设这是一个错字。
如果您的问题实际上是关于reduce而不是孤立的for循环:是的,很遗憾,您不得不在OpenMP中推出自己的最小/最大减少量,但这很简单,您只需使用通常的技巧即可。
更新2 -yikes,locmax需要是私有的,而不是共享的。
external.c:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
int update(int n, int iter);
int main(int argc, char **argv) {
int n=10;
int max, locmax;
max = -999;
#pragma omp parallel num_threads(4) default(none) shared(n, max) private(locmax)
for (int iter=0; iter<3; iter++)
{
#pragma omp single
printf("---iteration %d---\n", iter);
locmax = update(n, iter);
#pragma omp critical
{
if (locmax > max) max=locmax;
}
#pragma omp barrier
#pragma omp flush
#pragma omp single
printf("---iteration %d's max value = %d---\n", iter, max);
}
return 0;
}
inner.c:
#include <omp.h>
#include <stdio.h>
int update(int n, int iter) {
int thread = omp_get_thread_num();
int max = -999;
#pragma omp for
for (int i=0;i<n;i++) {
printf("%3d: doing loop index %d.\n",thread,i);
if (i+iter>max) max = i+iter;
}
return max;
}
和建筑:
$ make
gcc44 -g -fopenmp -std=c99 -c -o inner.o inner.c
gcc44 -g -fopenmp -std=c99 -c -o outer.o outer.c
gcc44 -o main outer.o inner.o -fopenmp -lgomp
bash-3.2$ ./main
---iteration 0---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
3: doing loop index 9.
---iteration 0's max value = 9---
---iteration 1---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
3: doing loop index 9.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
---iteration 1's max value = 10---
---iteration 2---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
3: doing loop index 9.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
---iteration 2's max value = 11---
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.