简体   繁体   English

如何将 MPI_Reduce 转换为 MPI_Send 和 MPI_Recv?

[英]How to convert MPI_Reduce into MPI_Send and MPI_Recv?

I am working on a parallel processing program that uses MPI_Send() and MPI_Recv() instead of using MPI_Reduce().我正在开发一个使用 MPI_Send() 和 MPI_Recv() 而不是 MPI_Reduce() 的并行处理程序。 I understand that MPI_Send() will need to send a value from each processor to the root processor aka 0 and MPI_Recv() will need to receive all of the values from each processor.我知道 MPI_Send() 需要从每个处理器向根处理器发送一个值,即 0,而 MPI_Recv() 需要从每个处理器接收所有值。

I keep getting the error where the value in Send will not be sent to the Receiving side thus making the final value 0. The MPI_Reduce() function is still in the code but commented out to see what needs to be replaced.我不断收到错误,其中 Send 中的值不会发送到接收端,从而使最终值为 0。 MPI_Reduce() function 仍在代码中,但已注释掉以查看需要替换的内容。 Can anyone help?任何人都可以帮忙吗?

#include "mpi.h"
#include <stdio.h>
#include <math.h>
 
int main( int argc, char *argv[])
{
    int n, i;
    double PI25DT = 3.141592653589793238462643;
    double pi, h, sum, x;
 
    int numprocs, myid;
    double startTime, endTime;
 
    /* Initialize MPI and get number of processes and my number or rank*/
    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);
 
    /* Processor zero sets the number of intervals and starts its clock*/
    if (myid==0) {
       n=600000000;
       startTime=MPI_Wtime();
       for (int i = 0; i < numprocs; i++) {
           if (i != myid) {
               MPI_Send(&n, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
           }
       }
    } 
    else {
        MPI_Recv(&n, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    }
 
    /* Calculate the width of intervals */
    h   = 1.0 / (double) n;
 
    /* Initialize sum */
    sum = 0.0;
    /* Step over each inteval I own */
    for (i = myid+1; i <= n; i += numprocs) {
        /* Calculate midpoint of interval */
        x = h * ((double)i - 0.5);
        /* Add rectangle's area = height*width = f(x)*h */
        sum += (4.0/(1.0+x*x))*h;
    }
    /* Get sum total on processor zero */
    //MPI_Reduce(&sum,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);
    double value = 0;
    if (myid != 0) {
            MPI_Send(&sum, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
    }
    else {
        for (int i = 1; i < numprocs; i++) {
            MPI_Recv(&value, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            pi += value;
            }
    }
    
    /* Print approximate value of pi and runtime*/
    if (myid==0) {
       printf("pi is approximately %.16f, Error is %e\n",
                       pi, fabs(pi - PI25DT));
       endTime=MPI_Wtime();
       printf("runtime is=%.16f",endTime-startTime);
    }
    MPI_Finalize();
    return 0;
}

You are using MPI_INT to send a value of type double :您正在使用MPI_INT发送double类型的值:

if (myid != 0) {
  MPI_Send(&sum, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
  //                ^^^^^^^
}

int is 4 bytes long; int是 4 个字节长; double is 8 bytes long. double是 8 个字节长。 Although the receive operation succeeds, it cannot construct a value of type MPI_DOUBLE given only 4 bytes from the message, so it doesn't write anything into value and it remains 0.0 .尽管接收操作成功,但它无法构造MPI_DOUBLE类型的值,仅给定消息中的 4 个字节,因此它不会将任何内容写入value并保持0.0 Indeed, if you replace:事实上,如果你更换:

MPI_Recv(&value, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

with

MPI_Status status;
int count;

MPI_Recv(&value, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_DOUBLE, &count);
if (count == MPI_UNDEFINED) {
  printf("Short message received\n");
  MPI_Abort(MPI_COMM_WORLD, 0);
}

your program will abort, indicating that the body of the conditional statement was executed due to MPI_Get_count() returning MPI_UNDEFINED in count , which signals that the length of the received message was not an integer multiple of the size of MPI_DOUBLE .您的程序将中止,表明条件语句的主体已执行,因为MPI_Get_count()count中返回MPI_UNDEFINED ,这表明接收到的消息的长度不是 MPI_DOUBLE 大小的MPI_DOUBLE

Also, pi must be explicitly initialised to sum before the receive loop, otherwise you will get the wrong value of pi due to either of the following errors:此外,必须在接收循环之前将pi显式初始化为sum ,否则由于以下任一错误,您将得到错误的pi值:

  • pi is left uninitialised and has arbitrary initial value, and pi未初始化并具有任意初始值,并且
  • the contribution of rank 0 is not added to the final result. rank 0 的贡献不会被添加到最终结果中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM