简体   繁体   English

顺序和并行版本给出不同的结果 - 为什么?

[英]Sequential and parallel versions give different results - Why?

I have a nested loop: (L and A are fully defined inputs)我有一个嵌套循环:(L 和 A 是完全定义的输入)

    #pragma omp parallel for schedule(guided) shared(L,A) \
    reduction(+:dummy)
    for (i=k+1;i<row;i++){
            for (n=0;n<k;n++){
                #pragma omp atomic
                dummy += L[i][n]*L[k][n];
                L[i][k] = (A[i][k] - dummy)/L[k][k];
            }
            dummy = 0;
    }

And its sequential version:及其顺序版本:

    for (i=k+1;i<row;i++){
            for (n=0;n<k;n++){
                dummy += L[i][n]*L[k][n];
                L[i][k] = (A[i][k] - dummy)/L[k][k];
            }
            dummy = 0;
    }

They both give different results.他们都给出不同的结果。 And parallel version is much slower than the sequential version.并行版本比顺序版本慢得多。

What may cause the problem?什么可能导致问题?

Edit:编辑:

To get rid of the problems caused by the atomic directive, I modified the code as follows:为了摆脱原子指令引起的问题,我修改了代码如下:

#pragma omp parallel for schedule(guided) shared(L,A) \
    private(i)
    for (i=k+1;i<row;i++){
        double dummyy = 0;
        for (n=0;n<k;n++){
            dummyy += L[i][n]*L[k][n];
            L[i][k] = (A[i][k] - dummyy)/L[k][k];
        }
    }

But it also didn't work out the problem.但它也没有解决问题。 Results are still different.结果还是不一样。

I am not very familiar with OpenMP but it seems to me that your calculations are not order-independent.我对 OpenMP 不太熟悉,但在我看来,您的计算与顺序无关。 Namely, the result in the inner loop is written into L[i][k] where i and k are invariants for the inner loop.即,将内循环的结果写入L[i][k] ,其中ik是内循环的不变量。 This means that the same value is overwritten k times during the inner loop, resulting in a race condition.这意味着同一值在内循环中被覆盖k次,从而导致竞争条件。

Moreover, dummy seems to be shared between the different threads, so there might be a race condition there too, unless your pragma parameters somehow prevent it.此外, dummy似乎在不同线程之间共享,因此那里也可能存在竞争条件,除非您的 pragma 参数以某种方式阻止它。

Altogether, to me it looks like the calculations in the inner loop must be performed in the same sequential order, if you want the same result as given by the sequential execution.总之,在我看来,如果您想要顺序执行给出的结果相同,则内部循环中的计算必须以相同的顺序执行。 Thus only the outer loop can be parallelized.因此只有外部循环可以并行化。

In your parallel version you've inserted an unnecessary (and possibly harmful) atomic directive.在您的并行版本中,您插入了一个不必要的(并且可能有害的)原子指令。 Once you've declared dummy to be a reduction variable OpenMP takes care of stopping the threads interfering in the reduction.一旦您将dummy声明为缩减变量,OpenMP 就会负责停止干扰缩减的线程。 I think the main impact of the unnecessary directive is to slow your code down, a lot.我认为不必要的指令的主要影响是大大降低了代码速度。

I see you have another answer addressing the wrongness of your results.我看到您有另一个解决结果错误的答案。 But I notice that you seem to set dummy to 0 at the end of each outer loop iteration, which seems strange if you are trying to use it as some kind of accumulator, which is what the reduction clause suggests.但是我注意到你似乎在每次外循环迭代结束时将dummy设置为0 ,如果你试图将它用作某种累加器,这似乎很奇怪,这就是 reduction 子句所建议的。 Perhaps you want to reduce to dummy across the inner loop?也许您想减少内部循环的dummy

If you are having problems with reduction read this .如果您在还原方面遇到问题,请阅读此

The difference in results comes from the inner loop variable n , which is shared between threads, since it is defined outside of the omp pragma.结果的差异来自内部循环变量n ,它在线程之间共享,因为它是在 omp pragma 之外定义的。

Clarified: The loop variable n should be declared inside the omp pragma, since it should be thread-specific, for example for (int n = 0;.....)澄清:循环变量n应该在 omp pragma 中声明,因为它应该是特定于线程的,例如for (int n = 0;.....)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么这两个函数给出不同的结果? - Why do the two functions give different results? 为什么这些增量会产生不同的结果? - Why do these increments give different results? 为什么不同的C ++编译器对此代码给出不同的结果? - Why do different C++ compilers give different results for this code? 为什么这两个指针减法会给出不同的结果? - Why do these two pointer subtractions give different results? 为什么 gcc 和 clang 在聚合初始化中给出不同的结果? - Why do gcc and clang give different results in aggregate initialization? 为什么这两段代码会给我不同的结果? - Why do these two segments of code give me different results? 为什么在C ++和Java中此float操作会产生不同的结果? - Why does this float operation in C++ and Java give different results? 为什么 std::locale(“”).name() 在 clang 和 gcc 上给出不同的结果? - Why does std::locale(“”).name() give different results on clang and gcc? 为什么不同的GCC 4.9.2安装会为此正则表达式匹配提供不同的结果? - Why do different GCC 4.9.2 installations give different results for this regex match? 为什么TinyXPath在两个不同的类中调用时会为同一个对象提供不同的结果? - Why does TinyXPath give different results for the same object when called in two different classes?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM