OpenMP - 关键部分+减少

Question

I'm currently learning Parallel Programming using C and OpenMP. 我目前正在学习使用C和OpenMP进行并行编程。 I wanted to write simple code where two shared values are beeing incremented by multiple threads. 我想编写简单的代码，其中两个共享值由多个线程递增。 Firstly I used reduction directive and it worked as it was meant to. 首先，我使用了还原指令，它的工作原理就是这样。 Then I switched to using the critical directive to initiate critical section - it also worked. 然后我切换到使用关键指令启动关键部分 - 它也有效。 Out of curiosity I've tried to merge these two solution and check the behaviour. 出于好奇，我试图合并这两个解决方案并检查行为。 I expected two valid, equal values. 我期待两个有效的，相等的值。

code: 码：

#include <stdio.h>
#include <stdlib.h>
#include "omp.h"

#define ITER 50000

int main( void )
{
    int x, y;
    #pragma omp parallel reduction(+:x,y)
    {
       #pragma omp for
       for (int i = 0; i < ITER; i++ )  
       {
            x++;
            #pragma omp critical
            y++;
       }
    }

    printf("non critical = %d\ncritical = %d\n", x, y);
    return 0;
}

output: 输出：

non critical = 50000 非关键= 50000
critical = 4246432 critical = 4246432

Of course output is random when it comes to 'critical' (variable y), the other behaves as expected and is always 50000. 当然，当涉及到'critical'（变量y）时，输出是随机的，另一个表现为预期的并且总是50000。

The behaviour of x is understandable - reduction makes it private in scope of single thread. x的行为是可以理解的 - 减少使其在单线程范围内是私有的。 After the incrementation values from threads are summed up and passed to the non-local x. 在将线程的增量值相加并传递给非局部x之后。

What I don't understand is the behaviour of y . 我不明白的是y的行为。 It's private just like x but it's also inside the critical section so it 'has more than one reason' to be inaccessible from other threads. 它就像x一样私有，但它也在临界区内，所以它有多个原因让其他线程无法访问。 Yet what, I think, happens is the race condition. 然而，我认为，恰好是竞争条件。 Did the critical somehow made y public (shared)? 难道关键不知何故Ÿ公共（共享）？

I know that this code makes no sense since it's enough to use only one of reduction / critical . 我知道这段代码没有任何意义，因为它只能使用一个简化 / 关键。 I'd like just to know what's behind such behaviour. 我想知道这种行为背后的原因。

Answer 1

The primary problem with your code is that x and y are not initialized. 您的代码的主要问题是x和y未初始化。 A second problem is that the variable used in the critical section should be shared instead of a reduction variable, although this should only affect performance, not correctness. 第二个问题是应该shared临界区中使用的变量而不是减少变量，尽管这只会影响性能，而不是正确性。

I've corrected your code and modified it to demonstrate how reduce , critical and atomic all produce the same result. 我已经纠正了你的代码并对其进行了修改，以证明reduce ， critical和atomic都能产生相同的结果。

Source 资源

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(int argc, char* argv[])
{
    int iter = (argc>1) ? atoi(argv[1]) : 50000;
    int r=0, c=0, a=0;

    printf("OpenMP threads = %d\n", omp_get_max_threads() );

    #pragma omp parallel reduction(+:r) shared(c,a)
    {
        #pragma omp for
        for (int i = 0; i < iter; i++ ) {
            r++;
            #pragma omp critical
            c++;
            #pragma omp atomic
            a++;
        }
    }
    printf("reduce   = %d\n"
           "critical = %d\n"
           "atomic   = %d\n", r, c, a);
    return 0;
}

Compile 编

icc -O3 -Wall -qopenmp -std=c99 redcrit.c

Output 产量

OpenMP threads = 4
reduce   = 50000
critical = 50000
atomic   = 50000

Answer 2

Your code simply exhibits undefined behaviour and the presence of critical has nothing to do with you getting wrong results. 您的代码只显示未定义的行为，并且critical的存在与您获得错误的结果无关。

Did the critical somehow made y public (shared)? 难道关键不知何故Ÿ公共（共享）？

No, it did not. 不，它没有。 It only slows down the loop by preventing the concurrent execution of the threads. 它只会通过阻止并发执行线程来减慢循环。

What you are missing is that the result of the reduction operation is combined with the initial value of the reduction variable , ie with the value the variable had before the parallel region. 您缺少的是减少操作的结果与减少变量的初始值组合 ，即与变量在并行区域之前具有的值相结合。 In your case, both x and y have random initial values and therefore you are getting random results. 在您的情况下， x和y都具有随机初始值，因此您获得随机结果。 That the initial value x happens to be 0 in your case and that's why you are getting the correct result for it is simply UB. 在你的情况下，初始值x恰好为0，这就是为什么你得到正确的结果就是UB。 Initialising both x and y makes your code behave as expected. 初始化x和y会使代码按预期运行。

The OpenMP specification states: OpenMP规范声明：

The reduction clause specifies a reduction-identifier and one or more list items. reduction子句指定reduce-identifier和一个或多个列表项。 For each list item, a private copy is created in each implicit task or SIMD lane, and is initialized with the initializer value of the reduction-identifier . 对于每个列表项，在每个隐式任务或SIMD通道中创建私有副本，并使用reduction-identifier的初始化值初始化。 After the end of the region, the original list item is updated with the values of the private copies using the combiner associated with the reduction-identifier . 在区域结束之后，使用与缩减标识符相关联的组合器使用私有副本的值更新 原始列表项 。

Here is the execution of your original code with 4 threads: 这是使用4个线程执行原始代码：

$ icc -O3 -openmp -std=c99 -o cnc cnc.c
$ OMP_NUM_THREADS=1 ./cnc
non critical = 82765
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 82765
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 82765
critical = 50194
$ OMP_NUM_THREADS=4 ./cnc
non critical = 82767
critical = 2112072800

The first run with one thread demonstrates that it is not due to a data race. 第一次运行一个线程表明它不是由于数据竞争。

With int x=0, y=0; int x=0, y=0; : ：

$ icc -O3 -openmp -std=c99 -o cnc cnc.c
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000

OpenMP - 关键部分+减少

问题描述

2 个解决方案

解决方案1
6 2016-02-03 13:57:58

Source 资源

Compile 编

Output 产量

解决方案2
6 已采纳 2016-02-04 14:26:34

OpenMP - 关键部分+减少

问题描述

2 个解决方案

解决方案1 6 2016-02-03 13:57:58

Source 资源

Compile 编

Output 产量

解决方案2 6 已采纳 2016-02-04 14:26:34

解决方案1
6 2016-02-03 13:57:58

解决方案2
6 已采纳 2016-02-04 14:26:34