简体   繁体   中英

MPI Send/Recv error

I have written a test program in C++ to make sure I understand how MPI send/recv works (Apparently I don't.) This test runs on 4 processors. The specific case I am interested in, processor 0 sends the array, "send_n" to processor 2, which receives it in the array, "recv_s". After the inital send, the array is correct (it should be all 5's), but after I do some additional send/recv's, the array somehow changes. What am I doing wrong here?

#include <stdlib.h>
#include <iostream>

using namespace std;
# include "mpi.h"
void passBCs(double recv_n[],double recv_e[],double recv_s[],double recv_w[]);
int getNextProcID(int pID, int direction);
int procID,numProcs;
int gridx=2,gridy=2;
int procGridX=2, procGridY=2; 
int main(){
int i,j,k;
int cryptIDs[2]={0,3};
int villusIDs[2]={1,2};

double recv_n[gridx*5],recv_e[gridy*5],recv_s[gridx*5],recv_w[gridy*5];

MPI::Init();
procID=MPI::COMM_WORLD.Get_rank();
numProcs=MPI::COMM_WORLD.Get_size();
if(procID==0){cout<<"MPI Initialized\n";}


passBCs(recv_n,recv_e,recv_s,recv_w);   
MPI::COMM_WORLD.Barrier();



if(procID==2){
for(i=0;i<10;i++){cout<<"Test Buffer="<<recv_s[i]<<"\n";}}

MPI::Finalize();    

if(procID==0){cout<<"Test Run Exiting Normally\n";}

}   


 void passBCs(double recv_n[],double recv_e[],double recv_s[],double recv_w[]){
int i,j,k,nId,eId,sId,wId,n_rId,e_rId,s_rId,w_rId;
int ntag,etag,stag,wtag;
double send_n[gridx*5],send_e[gridy*5],send_s[gridx*5],send_w[gridy*5];


ntag=0;
etag=1;
stag=2;
wtag=3;

if(procID==0){
for(i=0;i<10;i++){
    send_n[i]=5;
    send_s[i]=1;
    send_e[i]=2;
    send_w[i]=3;
}}
else{   for(i=0;i<10;i++){
    send_n[i]=0;
    send_s[i]=0;
    send_e[i]=0;
    send_w[i]=0;
}}


nId=getNextProcID(procID,0);
eId=getNextProcID(procID,1);
sId=getNextProcID(procID,2);
wId=getNextProcID(procID,3);

n_rId=getNextProcID(procID,2);
e_rId=getNextProcID(procID,3);
s_rId=getNextProcID(procID,0);
w_rId=getNextProcID(procID,1);

if(procID==2){cout<<"South Recv ID="<<n_rId<<"\n";}
if(procID==0){cout<<"Proc 0 sending North to "<<nId<<"\n";}

MPI::COMM_WORLD.Send(&send_n[0],20,MPI::DOUBLE,nId,ntag);
MPI::COMM_WORLD.Recv(&recv_s[0],20,MPI::DOUBLE,n_rId,ntag);
if(procID==2){
for(i=0;i<10;i++){cout<<"Test Buffer0="<<recv_s[i]<<"\n";}}

MPI::COMM_WORLD.Send(&send_e[0],20,MPI::DOUBLE,eId,etag);
MPI::COMM_WORLD.Recv(&recv_w[0],20,MPI::DOUBLE,e_rId,etag);
if(procID==2){
for(i=0;i<10;i++){cout<<"Test Buffer1="<<recv_s[i]<<"\n";}}

MPI::COMM_WORLD.Send(&send_s[0],20,MPI::DOUBLE,sId,stag);
MPI::COMM_WORLD.Recv(&recv_n[0],20,MPI::DOUBLE,s_rId,stag);

MPI::COMM_WORLD.Send(&send_w[0],20,MPI::DOUBLE,wId,wtag);
MPI::COMM_WORLD.Recv(&recv_e[0],20,MPI::DOUBLE,w_rId,wtag);

}

int getNextProcID(int pID, int direction){
//Returns the ID number for the processor that is "direction" to the give proc id.
//0=north,1=east,2=south,3=west;
int x_pos,y_pos,nextID; 
x_pos=pID%procGridX;
y_pos=pID/procGridY;
if(direction==0){y_pos++;}
if(direction==1){x_pos++;}
if(direction==2){y_pos--;}
if(direction==3){x_pos--;}
if(x_pos<0){x_pos=procGridX-1;}
if(x_pos>=procGridX){x_pos=0;}
if(y_pos<0){y_pos=procGridY-1;}
if(y_pos>=procGridY){y_pos=0;}
nextID=y_pos*procGridY+x_pos;
return nextID;  
}

The output is:

MPI Initialized
South Recv ID=0
Proc 0 sending North to 2
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=5
Test Buffer1=5
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=5
Test Buffer=5
Test Run Exiting Normally

I think the error you are observing is caused either by an unfortunate combination of inconsistent edits you made to your file or a misinterpretation of the count parameter of MPI::COMM_WORLD.Send . From the docs of OpenMPI :

void Comm::Send(const void* buf, int count, const Datatype& datatype, int dest, int tag) const

buf: Initial address of send buffer (choice).

count: Number of elements send (nonnegative integer).

datatype: Datatype of each send buffer element (handle).

dest: Rank of destination (integer).

tag: Message tag (integer).

comm: Communicator (handle).

Note that the count parameter is the number elements as defined by the third argument (not some type-independent size measure). Apparently, you originally either had 20 elements in your arrays or you thought that the count parameter denotes the size of the send buffer in 4-byte blocks. Anyway, your MPI command refers to 20 elements, for example MPI::COMM_WORLD.Send(&send_n[0],20,MPI::DOUBLE,nId,ntag); . Therefore, the Recv commands write data that was sent from beyond the end of an array to a location beyond the end of the target array! You were just lucky not to see a segfault (and lucky again because the compiler put your arrays next to each other so you actually saw the effect of writing outside the array). I just compiled your program, replacing all those 20 s with 10 s and it runs fine (I am not reposting the code because it's just such a simple change).

Output after the change:

MPI Initialized
Proc 0 sending North to 2
South Recv ID=0
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Run Exiting Normally

To avoid those mistakes in the future, couple the sizes of your arrays with the count you pass to the Send command, eg, through some compile time constant or preprocessor macro. Since you are using C++ anyways, why not use std::vector<double> instead of double arrays, whose size you can determine at run time before you send them, no matter how they were constructed?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM