简体   繁体   English

OpenMP缩减了并行循环并减少了

[英]OpenMP collapsed parallel loop with reduction

i'm trying to parallelize this collapse loops with openMP, but this is what i got: "smooth.c:47:6: error: not enough perfectly nested loops before 'sum' sum = 0;" 我正在尝试用openMP并行处理此崩溃循环,但这就是我得到的:“ smooth.c:47:6:错误:'sum'sum = 0之前没有足够嵌套的循环;”

Somebody knows a good way to parallelize this? 有人知道并行化此方法的好方法吗? i'm stuck 2 days in this problem. 我在这个问题上停留了2天。

Here my loops: 这是我的循环:

long long int sum;  
#pragma omp parallel for collapse(3) default(none) shared(DY, DX) private(dx, dy) reduction(+:sum) 
            for (y = 0; y < height; y++) {
                for (x = 0; x < width; x++) {
                     sum = 0;
                    for (d = 0; d < 9; d++) {
                        dx = x + DX[d];
                        dy = y + DY[d];
                        if (dx >= 0 && dx < width && dy >= 0 && dy < height)
                            sum += image(dy, dx);
                    }
                    smooth(y, x) = sum / 9;
                }
            }

Full code: https://github.com/fernandesbreno/smooth_ 完整代码: https//github.com/fernandesbreno/smooth_

i'm trying to parallelize this collapse loops with openMP, but this is what i got: "smooth.c:47:6: error: not enough perfectly nested loops before 'sum' sum = 0;" 我正在尝试用openMP并行处理此崩溃循环,但这就是我得到的:“ smooth.c:47:6:错误:'sum'sum = 0之前没有足够嵌套的循环;”

You cannot collapse three loop levels because the third level is not perfectly nested inside the second. 您不能折叠三个循环级别,因为第三个级别未完全嵌套在第二个级别内。 There is

sum = 0;

before it and 在它之前和

smooth(y, x) = sum / 9;

after it in the middle loop. 在中间循环之后。 (I suppose smooth() is a macro, else the assignment doesn't make sense. Don't do that, though, because it's confusing.) (我想smooth()是一个宏,否则赋值就没有意义了。但是不要这样做,因为这很令人困惑。)

Consider how you would rewrite that loop nest into an equivalent single loop by hand, using your knowledge of the problem structure and details. 考虑一下如何利用对问题结构和细节的了解,将循环嵌套手工重写为等效的单个循环。 I submit that it would be challenging to do so, and that the result would furthermore have unavoidable data dependencies. 我认为这样做将具有挑战性,而且结果将不可避免地具有数据依赖性。 But if you managed to do it without introducing dependencies, then voila! 但是,如果您在不引入依赖的情况下成功做到了,那就瞧! You have a single flat loop to parallelize, no collapsing needed. 您只有一个平面环路可以并行化,而无需折叠。

Your simplest way forward, however, would probably be to collapse only two levels instead of three. 但是,最简单的前进方法可能是仅折叠两个级别而不是三个级别。 Moreover, you want to compare with not collapsing at all, as it's not at all clear that collapsing will yield an improvement vs. parallelizing only the outer loop, and collapsing might even be worse . 而且,您要与完全不折叠进行比较,因为与仅并行化外循环相比,完全折叠并不会带来改善,而且折叠甚至可能更糟

But if you must have OpenMP collapse all three levels of the nest, then you need to take the two lines I called out above, and lift them out of the loop nest. 但是,如果必须使OpenMP折叠嵌套的所有三个级别,则需要使用我在上面调用的两行,并将其从循环嵌套中移出。 Possibly you could do that in part by getting rid of sum altogether and working directly with the result raster. 可能您可以通过完全消除sum并直接使用结果栅格来部分实现此目的。 Again, this is not necessarily going to produce an improvement. 同样,这不一定会产生改善。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM