简体   繁体   English

C MPI矩阵乘法误差

[英]C MPI Matrix multiplication error

I'm doing some matrix multiplication in C with MPI. 我正在用MPI在C中做一些矩阵乘法。 It works fine until I try to go above 15x15 and I cant figure out why... 直到我尝试超过15x15为止,它都能正常工作,我不知道为什么...

From what I've noticed the error seems to mostly happen after I see a "Process # sending..." print, which happens when the slave processes are sending their data back to the master process. 据我所知,该错误似乎主要发生在看到“进程#发送...”打印之后,该错误在从属进程将其数据发送回主进程时发生。

Error message: 错误信息:

[LEC-B125N4J:12183] *** Process received signal ***
[LEC-B125N4J:12183] Signal: Segmentation fault (11)
[LEC-B125N4J:12183] Signal code: Address not mapped (1)

Code: 码:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <mpi.h>

//#define DIM 1000
#define DIM 15

/* 
* Statically allocate the matrices to make the rows 
* sequentially placed in memory. (This eases the task
* of distributing the problem among the slaves.)
* Make the matrices global to allow for larger 
* dimensions. 
*/
int A[DIM][DIM];
int B[DIM][DIM];
int C[DIM][DIM];
int D[DIM][DIM];

int correct_result(int A[DIM][DIM], int B[DIM][DIM])
{
    int i,j;
    for (i=0; i<DIM; ++i)
        for (j=0; j<DIM; ++j)
            if (A[i][j] != B[i][j])
                return 0;
    return 1;
}


int main (argc, argv)
int argc;
char *argv[];
{
    int rank=0, size;
    int i, j, k;
    int time1;
    volatile int tmp;
    int iOffset = 0;
    int iProblemSize = 0;

    MPI_Init(&argc, &argv);         /* starts MPI */
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);   /* get current process id */
    MPI_Comm_size(MPI_COMM_WORLD, &size);   /* get number of processes */

    iProblemSize = (DIM / (size - 1));

    if(rank == 0) { //Master
        printf("Number of processes: %d (1 Master and %d slaves) - DIM: %d\n", size, (size - 1), DIM);

        //Fill matrices A and B with random numbers
        srand(timer(NULL));
        for(i=0; i<DIM; ++i)
        {
            for (j=0; j<DIM; ++j)
            {
                A[i][j] = random() % 100 - 50;
                B[i][j] = random() % 100 - 50;
                C[i][j] = 0;
            }
        }
    }

    MPI_Bcast(B, (DIM * DIM), MPI_INT, 0, MPI_COMM_WORLD);

    if(rank == 0) { //Master
        /* Calculate the true answer */
        for (i=0; i<DIM; ++i)
            for (k=0; k<DIM; ++k)
                for (j=0; j<DIM; ++j)
                    D[i][j] += A[i][k] * B[k][j];


        time1 = timer();


        //Send pieces of A to the slaves
        iOffset = 0;
        for(i = 1; i < size; i++) {
            MPI_Send(A[iOffset], (iProblemSize * DIM), MPI_INT, i, 0, MPI_COMM_WORLD);
            iOffset += iProblemSize;

            /*for(j = 0; j < iProblemSize; j++) {
                MPI_Send(A[iOffset + j], DIM, MPI_INT, i, 0, MPI_COMM_WORLD);
            }
            iOffset += iProblemSize;*/
        }

        //Take care of leftovers if needed (if uneven number of slaves)
        if((size - 1) % DIM != 0) {
            for(i = iOffset; i < DIM; i++) {
                for(k = 0; k < DIM; k++) {
                    for(j = 0; j < DIM; j++) {
                        C[i][j] += A[i][k] * B[k][j];
                    }
                }
            }
        }

        //Gather the results from the slaves
        iOffset = 0;
        for(i = 1; i < size; i++) {
            MPI_Recv(C[iOffset], (iProblemSize * DIM), MPI_INT, i, 0, MPI_COMM_WORLD, NULL);
            iOffset += iProblemSize;
            printf("Received from %d!\n", i);
        }
        printf("All received!\n");

        /* Error checking */
        time1 = timer() - time1;
        printf ("Your calculation is %scorrect.\n", correct_result(C,D) ? "" : "not ");
        printf ("Total runtime: %f seconds\n", time1/1000000.0);
    }
    else { //Slaves
        MPI_Recv(A, (iProblemSize * DIM), MPI_INT, 0, 0, MPI_COMM_WORLD, NULL);
        /*for(j = 0; j < iProblemSize; j++) {
            MPI_Recv(A[j], DIM, MPI_INT, 0, 0, MPI_COMM_WORLD, NULL);
        }*/

        //Do the calculations for C
        //printf("Process %d doing calculations...\n", rank);
        for (i = 0; i < (iProblemSize * DIM); ++i) {
            for (k = 0; k < DIM; ++k) {
                for (j = 0; j < DIM; ++j) {
                    C[i][j] += A[i][k] * B[k][j];
                }
                //printf("\n");
            }
        }
        //printf("Process %d finished doing the calculations!\n", rank);

        //Send the result to the master
        printf("Process %d sending...\n", rank);
        MPI_Send(C, (iProblemSize * DIM), MPI_INT, 0, 0, MPI_COMM_WORLD);
        printf("Process %d finished sending!\n", rank);
    }


    MPI_Finalize();

    return 0;
}

OK I finally fixed the error. 好的,我终于解决了错误。 The problem was in the loop when the slaves are doing the calculations... 当从站进行计算时,问题出在循环中。

for (i = 0; i < (iProblemSize * DIM); ++i) {

should be 应该

for (i = 0; i < iProblemSize; ++i) {

:) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM