MPI_Reduce C/C++ - 信号：分段错误 (11)

Question

I don't understand well how the MPI_Reduce works with array.我不太了解MPI_Reduce如何与数组一起使用。 I need to do an element wise sum.我需要做一个元素明智的总和。

To test the MPI_Reduce function I wrote this simple code and it works:为了测试MPI_Reduce function，我编写了这个简单的代码，它可以工作：

double a[4] = {0,1,2,(double)process_id};
double b[4];
MPI_Reduce(&a, &b, 4, MPI_DOUBLE, MPI_SUM, p-1, MPI_COMM_WORLD);
if(id == p-1) {
    for(int i = 0; i < 4; i++){
        printf("%f, ", b[i]);
    }
}

it prints this:它打印这个：

0.00000, 4.00000, 8.00000, 6.00000

when I run this code with 4 process.当我使用 4 个进程运行此代码时。 It works!有用！

Now I implement my problem.现在我实现我的问题。 Assuming I use p process, I need to reduce p matrices of dimensions m * n so I rewrite each matrices in form of array假设我使用p过程，我需要减少尺寸为m * n的p矩阵，所以我以数组的形式重写每个矩阵

double *a;
double **A;

A = new double*[n];
//code that compute matrix A
a = (double *) malloc(m * n * sizeof(double));
int k = 0;
for(int i = 0; i < m; i++) {
    for(int j = 0; j < n; j++){
        a[k] = A[i][j];
        k++;
    }
}

In this way I have the matrices that I need to reduce in form of array.这样，我就有了需要以数组形式减少的矩阵。 Now I execute this reduction:现在我执行这个缩减：

if(id == p-1){
    reduce_storage = (double *) malloc(m * n * sizeof(double));
}

MPI_Reduce(&a, &reduce_storage, m * n, MPI_DOUBLE, MPI_SUM, p-1, MPI_COMM_WORLD);

Array a and reduce_storage are allocated in the same way so the are of the same dimension m * n, the value of count argument of MPI_Reduce.数组a和reduce_storage以相同的方式分配，因此它们具有相同的维度 m * n，即 MPI_Reduce 的count参数的值。 I don't understand why I try to run it return this error:我不明白为什么我尝试运行它会返回此错误：

*** stack smashing detected ***: <unknown> terminated
[EdoardoPC:01104] *** Process received signal ***
[EdoardoPC:01104] Signal: Aborted (6)
[EdoardoPC:01104] Signal code:  (-6)
[EdoardoPC:01104] *** Process received signal ***
[EdoardoPC:01104] Signal: Segmentation fault (11)
[EdoardoPC:01104] Signal code:  (128)
[EdoardoPC:01104] Failing at address: (nil)

Answer 1

I don't understand well how the MPI_Reduce works with array.我不太了解 MPI_Reduce 如何与数组一起使用。 I need to do an element wise sum.我需要做一个元素明智的总和。

From source about MPI_Reduce one can read:从关于 MPI_Reduce 的源代码可以阅读：

Reduces values on all processes to a single value将所有流程的值减少到单个值

int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm)

In your case MPI_Reduce will work as the following image:在您的情况下， MPI_Reduce将如下图所示：

(image taken from https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/ ) （图片取自https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/ ）

From the same source one can read:从同一来源可以阅读：

MPI_Reduce takes an array of input elements on each process and returns an array of output elements to the root process. MPI_Reduce 获取每个进程的输入元素数组，并将 output 元素数组返回给根进程。 The output elements contain the reduced result. output 元素包含缩减的结果。

Now let look at your problem现在让我们看看你的问题

To test the MPI_Reduce function I write this simple code and it works:为了测试 MPI_Reduce function，我编写了这个简单的代码，它可以工作：

double a[4] = {0,1,2,(double)process_id};
double b[4];
MPI_Reduce(&a, &b, 4, MPI_DOUBLE, MPI_SUM, p-1, MPI_COMM_WORLD);

All the parameter are correct;所有参数都正确； &a and &b match const void *sendbuf and void *recvbuf , respectively. &a和&b分别匹配const void *sendbuf和void *recvbuf 。 The same applies to remaining parameters, namely int , MPI_Datatype , MPI_Op , int and MPI_Comm .这同样适用于其余参数，即int 、 MPI_Datatype 、 MPI_Op 、 int和MPI_Comm 。

In this context, having a and b or &a and &b, respectively is the "same" .在这种情况下，分别具有a和b或&a和 &b 是“相同的” 。 The same in the sense that a and &a yield the same memory address.在a和&a产生相同的 memory 地址的意义上相同。 Notwithstanding, there are important differences between using a and &a , for an in-depth explanation read the following difference between “array” and “&array” .尽管如此，使用a和&a之间还是有重要的区别，要深入解释，请阅读以下“array”和“&array”之间的区别。

Array a and reduce_storage are allocated in the same way so the are of the same dimension m * n, the value of count argument of MPI_Reduce.数组 a 和 reduce_storage 以相同的方式分配，因此它们具有相同的维度 m * n，即 MPI_Reduce 的 count 参数的值。 I don't understand why I try to run it return this error:我不明白为什么我尝试运行它会返回此错误：

In the second call在第二次通话中

MPI_Reduce(&a, &reduce_storage, m * n, MPI_DOUBLE, MPI_SUM, p-1, MPI_COMM_WORLD);

The argument a and reduce_storage are now both of type double* and you are passing &a and &reduce_storage as argument of the MPI_Reduce .参数a和reduce_storage现在都是double*类型，并且您将&a和&reduce_storage作为MPI_Reduce的参数传递。 This is wrong because &a and &reduce_storage will return the address of the variable a , and reduce_storage , respectively, which would be a pointer to a pointer-to-double.这是错误的，因为&a和&reduce_storage将分别返回变量a和reduce_storage的地址，这将是一个指向双精度指针的指针。

Assuming I use p process, I need to reduce p假设我使用 p 过程，我需要减少 p

Side note: Using the 'p' as the total number of sizes is a little be confusion, a better name IMO would be total_processes , number_of_processes or something along those lines.旁注：使用“p”作为大小的总数有点令人困惑，更好的名称 IMO 将是total_processes ， number_of_processes或类似的东西。

MPI_Reduce C/C++ - 信号：分段错误 (11)

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-02-27 15:55:01

MPI_Reduce C/C++ - 信号：分段错误 (11)

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-02-27 15:55:01

解决方案1
2 已采纳 2021-02-27 15:55:01