简体   繁体   中英

An MPI program crashes after it's done printing output

The following program (from this deleted question ) appeared to be otherwise correct but was mysteriously crashing after printing the results. I've pasted the program as-is - see if you can spot the bug:)

#include <assert.h>
#include <memory.h>
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char ** argv) {
  MPI_Init(&argc, &argv);
  int n = 16;
  int npow2 = 16 * 16;
  int world_size;
  int b[n][n];
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
  int m = world_size;
  int reg = npow2 / m;
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  int fstart = rank * reg;
  int fend = fstart + reg;
  int myres[reg];
  int mystart = (int)(fstart / n);
  int myend = (int)(fend / n);
  int dim = myend - mystart;
  int a[dim][n];
  int i, j, k = 0;

  //all processes

  for (i = 0; i < n; i++) {
    for (j = 0; j < n; j++) {
      b[i][j] = (i + 1) * (j + 1);
    }
  }

  //eaach process initialize its contribution 

  int tmpi = 0;
  for (i = mystart; i < myend; i++) {
    for (j = 0; j < n; j++) {
      a[tmpi][j] = (i + 1) * (j + 1);
    }
    tmpi++;
  }

  int inx = 0;
  for (i = 0; i < dim; i++) {
    for (j = 0; j < n; j++) {
      myres[inx] = 0;
      int tmpres = 0;
      for (k = 0; k < n; k++) {
        tmpres = a[i][k] * b[k][j];
        myres[inx] = myres[inx] + tmpres;
      }
      inx++;
      if (inx >= reg) goto xlabel;

    }
  }

xlabel:

  MPI_Barrier(MPI_COMM_WORLD);
  int * recvmatrix = NULL;
  if (rank == 0) {
    recvmatrix = malloc(sizeof(int) * npow2);
  }
  assert(world_size * reg == npow2);
  MPI_Gather(myres, reg, MPI_INT, recvmatrix, reg, MPI_INT, 0, MPI_COMM_WORLD);

  if (rank == 0) {
    for (i = 0; i < npow2; i++) {
      printf("%d,", recvmatrix[i]);
      if (((i + 1) % n) == 0) printf("\n");
    }
  }

  MPI_FINALIZE();
  free(recvmatrix);
}

The following "fix", consisting of an include and two added lines, plastered around the issue, allowing the program to finish without crashing the root instance.

#include <unistd.h>

...

  if (rank == 0) {
    for (i = 0; i < npow2; i++) {
      printf("%d,", recvmatrix[i]);
      if (((i + 1) % n) == 0) printf("\n");
    }
    fflush(stdout);      // < added
    _exit(0);            // < added
  }

What's going on?

Hint: The asker did not mention that the compiler was issuing a warning about implicit function declaration.

It turns out that somehow the program was built with -lmpi and -lmpi_mpifh (or equivalent), ie linked with the FORTRAN library too . Perhaps the student was instructed to do this, or another student had "helped" them, or they figured out on their own how to "fix" the linker error - I can only speculate. MPI_FINALIZE() triggers a C compiler warning, since it's not a part of the C API, but it will happily link - with the Fortran function, in all its uppercase glory. Of course it will then crash. The following command line reproduces this nugget on Debian: gcc -o mpi mpi.c -lmpi -lmpi_mpifh -I /usr/lib/include/openmpi; mpirun./mpi gcc -o mpi mpi.c -lmpi -lmpi_mpifh -I /usr/lib/include/openmpi; mpirun./mpi

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM