MPI：进程0执行两次其代码

Question

我在MPI程序上遇到了一个奇怪的问题。 部分代码应该仅由根执行（进程零），但是进程零似乎将其执行两次。 例如，

root = 0;
if (rank == root) {
    cout << "Hello from process " << rank << endl;
}

给

您好，来自流程0

您好，来自流程0

这似乎仅在我使用16个或更多进程时发生。 我已经尝试调试了好几天，但是没有。

由于我不知道为什么会这样，所以我认为我必须在这里复制整个代码。 我说得很清楚。 目标是将两个矩阵相乘（以简化的假设）。 该问题发生在最后的if块中。

#include <iostream>
#include <cstdlib>
#include <cmath>
#include "mpi.h"

using namespace std;

int main(int argc, char *argv[]) {
    if (argc != 2) {
        cout << "Use one argument to specify the N of the matrices." << endl;
        return -1;
    }

    int N = atoi(argv[1]);
    int A[N][N], B[N][N], res[N][N];

    int i, j, k, start, end, P, p, rank;

    int root=0;
    MPI::Status status;

    MPI::Init(argc, argv);

    rank = MPI::COMM_WORLD.Get_rank();
    P = MPI::COMM_WORLD.Get_size();
    p = sqrt(P);

    /* Designate the start and end position for each process. */
    start = rank * N/p;
    end = (rank+1) * N/p;

    if (rank == root) { // No problem here
        /* Initialize matrices. */
        for (i=0; i<N; i++)
            for (j=0; j<N; j++) {
                A[i][j] = N*i + j;
                B[i][j] = N*i + j;
            }

        cout << endl << "Matrix A: " << endl;
        for(i=0; i<N; ++i)
            for(j=0; j<N; ++j) {
                cout << "  " << A[i][j];
                if(j==N-1)
                    cout << endl;
            }

        cout << endl << "Matrix B: " << endl;
        for(i=0; i<N; ++i)
            for(j=0; j<N; ++j) {
                cout << "  " << B[i][j];
                if(j==N-1)
                    cout << endl;
            }
    }

    /* Broadcast B to all processes. */
    MPI::COMM_WORLD.Bcast(B, N*N, MPI::INT, 0);

    /* Scatter A to all processes. */
    MPI::COMM_WORLD.Scatter(A, N*N/p, MPI::INT, A[start], N*N/p, MPI::INT, 0);
    /* Compute your portion of the final result. */    
    for(i=start; i<end; i++)
        for(j=0; j<N; j++) {
            res[i][j] = 0;
            for(k=0; k<N; k++)
                res[i][j] += A[i][k]*B[k][j];
        }

    MPI::COMM_WORLD.Barrier();
    /* Gather results form all processes. */    
    MPI::COMM_WORLD.Gather(res[start], N*N/p, MPI::INT, res, N*N/p, MPI::INT, 0);


    if (rank == root) { // HERE is the problem!
        // This chunk executes twice in process 0
        cout << endl << "Result of A x B: " << endl;
        for(i=0; i<N; ++i)
            for(j=0; j<N; ++j) {
                cout << "  " << res[i][j];
                if(j == N-1)
                    cout << endl;
            }
    }

    MPI::Finalize();
    return 0;
}

当我使用P = 16和两个4x4矩阵运行程序时：

>$ mpirun -np 16 ./myprog 4

Matrix A: 
  0  1  2  3
  4  5  6  7
  8  9  10  11
  12  13  14  15

Matrix B: 
  0  1  2  3
  4  5  6  7
  8  9  10  11
  12  13  14  15

Result of A x B: 
  6366632  0  0  0
  -12032  32767  0  0
  0  0  -1431597088  10922
  1  10922  0  0

Result of A x B: 
  56  62  68  74
  152  174  196  218
  248  286  324  362
  344  398  452  506

为什么要打印出第一个结果？ 如果有人愿意帮助我，我将不胜感激。

Answer 1

您有未定义的行为/您正在破坏内存。 假设您的示例具有N=4 ， P=16 ， p=4 。 因此， start=rank 。

Scatter时会做什么？ 您将每个元素发送到16个进程。 MPI将假设根上的A包含64个元素，但仅包含16个元素。此外，您将它们存储在A[start]所有列中。 我什至不知道该定义是否正确，但是它应该等于A[start][0] ，当rank >= 4时，它不在为A分配的内存中。 因此，您已经读写无效的内存。 极其无效的内存访问在循环和Gather继续。

不幸的是，MPI程序可能难以调试，尤其是在内存损坏方面。 OpenMPI有非常有价值的信息。 阅读整个页面！ mpirun -np 16 valgrind ...会告诉您有关此问题的信息。

其他一些值得注意的问题：

MPI的C ++绑定已弃用多年。 您应该使用C ++中的C绑定或诸如Boost.MPI的高级绑定。
可变长度数组不是标准的C ++。
你并不需要一个Barrier的前Gather 。
确保您的代码没有充满未经检查的假设。 assert P是正方形（如果需要），N被p整除（如果需要）。
永远不要命名两个变量P和p 。

现在，除了使用调试工具之外，我还在努力向我推荐什么。 如果需要快速并行矩阵乘法-请使用库。 如果您想编写漂亮的高级代码作为练习，请使用boost::mpi和一些高级矩阵抽象。 如果您想练习编写低级代码，请使用std::vector<>(N*N) ，构建自己的2D索引，并仔细考虑如何对其进行索引以及如何访问正确的内存块。

MPI：进程0执行两次其代码

问题描述

1 个解决方案

解决方案1
1 2016-03-03 20:46:49

MPI：进程0执行两次其代码

问题描述

1 个解决方案

解决方案1 1 2016-03-03 20:46:49

解决方案1
1 2016-03-03 20:46:49