[英]When is the reduction needed?
I've written this code which reads a Matrix and it basically sums the values of the matrix... But my question would be, since I've tried doing the pragma in different ways, I found that the reduction (+:sum)
wouldn't be necessary, but, I just don't know why, I might have missed the actual sense of the reduction system in this case. 我已经编写了这段代码,该代码读取一个Matrix,它基本上将矩阵的值相加...但是我的问题是,因为我尝试以不同的方式进行编译,所以我发现reduction (+:sum)
并不是必须的,但是,我只是不知道为什么,在这种情况下,我可能错过了简化系统的实际含义。 This would be the alternative: #pragma omp parallel for private(i, j) reduction (+:sum)
这将是替代方法: #pragma omp parallel for private(i, j) reduction (+:sum)
And this would be the code: 这将是代码:
#include <stdio.h>
#include <math.h>
#include <omp.h>
#include <unistd.h>
int main ()
{
printf("===MATRIX SUM===\n");
printf("N ROWS: ");
int i1; scanf("%d",&i1);
printf("M COLUMNS: ");
int j1; scanf("%d",&j1);
int matrixA[i1][j1];
int i, j;
for(i = 0; i < i1; i++){
for (j = 0; j < j1; j++){
scanf("%d",&matriuA[i][j]);
}
}
printf("\nMATRIX A: \n");
for (i = 0; i < i1; i++){
for (j = 0; j < j1; j++){
printf("%d ", matrixA[i][j]);
}
printf("\n");
}
int sum = 0;
#pragma omp parallel for private(i, j)
for (i = 0; i < i1; i++)
for (j = 0; j < j1; j++){
sum += matrixA[i][j];
}
printf("\nTHE RESULT IS: %d", sum);
return 0;
}
And, I would like to ask, if there would be like, a better solution for the pragma reduction since I read that's the most efficient way. 而且,我想问一下,是否有更好的解决方法,因为减少了语用,因为我读到这是最有效的方法。
The code you posted is not correct without the reduction clause . 没有reduce子句,您发布的代码是不正确的 。
sum += matrixA[i][j];
Will cause a classic race condition when executed by multiple threads in parallel. 当由多个线程并行执行时,将导致经典的竞争条件。 Sum is a shared variable, but sum += ...
is not an atomic operation. Sum是共享变量,但sum += ...
不是原子操作。
(sum is initially 0, all matrix elements 1)
Thread 1 | Thread 2
-----------------------------------------------------------
tmp = sum + matrix[0][0] = 1 |
| tmp = sum + matrix[1][0] = 1
sum = tmp = 1 |
| sum = tmp = 1 (instead of 2)
The reduction fixes exactly this. 减少正是解决此问题。 With reduction, the loop will work on an implicit thread-local copy of the sum
variable. 通过减少,循环将在sum
变量的隐式线程本地副本上工作。 At the end of the region, the original sum
variable will be set to the sum of all thread-local copies (in a correct way without race-conditions). 在该区域的末尾,原始的sum
变量将被设置为所有线程本地副本的和(以正确的方式,没有竞争条件)。
Another solution would be to mark the sum += ...
as atomic operation or critical section. 另一个解决方案是将sum += ...
标记为原子操作或关键部分。 That, however has a significant performance penalty. 但是,这会带来很大的性能损失。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.