简体   繁体   English

MPI发送/接收错误

[英]MPI Send/Recv error

I have written a test program in C++ to make sure I understand how MPI send/recv works (Apparently I don't.) This test runs on 4 processors. 我已经用C ++编写了一个测试程序,以确保我了解MPI发送/接收的工作方式(显然我不知道。)此测试在4个处理器上运行。 The specific case I am interested in, processor 0 sends the array, "send_n" to processor 2, which receives it in the array, "recv_s". 我感兴趣的特定情况是,处理器0将数组“ send_n”发送给处理器2,处理器2在数组“ recv_s”中接收它。 After the inital send, the array is correct (it should be all 5's), but after I do some additional send/recv's, the array somehow changes. 初始发送之后,数组是正确的(应该全为5),但是在执行其他一些发送/接收后,数组会有所改变。 What am I doing wrong here? 我在这里做错了什么?

#include <stdlib.h>
#include <iostream>

using namespace std;
# include "mpi.h"
void passBCs(double recv_n[],double recv_e[],double recv_s[],double recv_w[]);
int getNextProcID(int pID, int direction);
int procID,numProcs;
int gridx=2,gridy=2;
int procGridX=2, procGridY=2; 
int main(){
int i,j,k;
int cryptIDs[2]={0,3};
int villusIDs[2]={1,2};

double recv_n[gridx*5],recv_e[gridy*5],recv_s[gridx*5],recv_w[gridy*5];

MPI::Init();
procID=MPI::COMM_WORLD.Get_rank();
numProcs=MPI::COMM_WORLD.Get_size();
if(procID==0){cout<<"MPI Initialized\n";}


passBCs(recv_n,recv_e,recv_s,recv_w);   
MPI::COMM_WORLD.Barrier();



if(procID==2){
for(i=0;i<10;i++){cout<<"Test Buffer="<<recv_s[i]<<"\n";}}

MPI::Finalize();    

if(procID==0){cout<<"Test Run Exiting Normally\n";}

}   


 void passBCs(double recv_n[],double recv_e[],double recv_s[],double recv_w[]){
int i,j,k,nId,eId,sId,wId,n_rId,e_rId,s_rId,w_rId;
int ntag,etag,stag,wtag;
double send_n[gridx*5],send_e[gridy*5],send_s[gridx*5],send_w[gridy*5];


ntag=0;
etag=1;
stag=2;
wtag=3;

if(procID==0){
for(i=0;i<10;i++){
    send_n[i]=5;
    send_s[i]=1;
    send_e[i]=2;
    send_w[i]=3;
}}
else{   for(i=0;i<10;i++){
    send_n[i]=0;
    send_s[i]=0;
    send_e[i]=0;
    send_w[i]=0;
}}


nId=getNextProcID(procID,0);
eId=getNextProcID(procID,1);
sId=getNextProcID(procID,2);
wId=getNextProcID(procID,3);

n_rId=getNextProcID(procID,2);
e_rId=getNextProcID(procID,3);
s_rId=getNextProcID(procID,0);
w_rId=getNextProcID(procID,1);

if(procID==2){cout<<"South Recv ID="<<n_rId<<"\n";}
if(procID==0){cout<<"Proc 0 sending North to "<<nId<<"\n";}

MPI::COMM_WORLD.Send(&send_n[0],20,MPI::DOUBLE,nId,ntag);
MPI::COMM_WORLD.Recv(&recv_s[0],20,MPI::DOUBLE,n_rId,ntag);
if(procID==2){
for(i=0;i<10;i++){cout<<"Test Buffer0="<<recv_s[i]<<"\n";}}

MPI::COMM_WORLD.Send(&send_e[0],20,MPI::DOUBLE,eId,etag);
MPI::COMM_WORLD.Recv(&recv_w[0],20,MPI::DOUBLE,e_rId,etag);
if(procID==2){
for(i=0;i<10;i++){cout<<"Test Buffer1="<<recv_s[i]<<"\n";}}

MPI::COMM_WORLD.Send(&send_s[0],20,MPI::DOUBLE,sId,stag);
MPI::COMM_WORLD.Recv(&recv_n[0],20,MPI::DOUBLE,s_rId,stag);

MPI::COMM_WORLD.Send(&send_w[0],20,MPI::DOUBLE,wId,wtag);
MPI::COMM_WORLD.Recv(&recv_e[0],20,MPI::DOUBLE,w_rId,wtag);

}

int getNextProcID(int pID, int direction){
//Returns the ID number for the processor that is "direction" to the give proc id.
//0=north,1=east,2=south,3=west;
int x_pos,y_pos,nextID; 
x_pos=pID%procGridX;
y_pos=pID/procGridY;
if(direction==0){y_pos++;}
if(direction==1){x_pos++;}
if(direction==2){y_pos--;}
if(direction==3){x_pos--;}
if(x_pos<0){x_pos=procGridX-1;}
if(x_pos>=procGridX){x_pos=0;}
if(y_pos<0){y_pos=procGridY-1;}
if(y_pos>=procGridY){y_pos=0;}
nextID=y_pos*procGridY+x_pos;
return nextID;  
}

The output is: 输出为:

MPI Initialized
South Recv ID=0
Proc 0 sending North to 2
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=0
Test Buffer1=5
Test Buffer1=5
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=0
Test Buffer=5
Test Buffer=5
Test Run Exiting Normally

I think the error you are observing is caused either by an unfortunate combination of inconsistent edits you made to your file or a misinterpretation of the count parameter of MPI::COMM_WORLD.Send . 我认为您正在观察的错误是由于对文件进行的不一致编辑的不幸组合或对MPI::COMM_WORLD.Sendcount参数的错误解释引起的。 From the docs of OpenMPI : OpenMPI的文档:

void Comm::Send(const void* buf, int count, const Datatype& datatype, int dest, int tag) const void Comm :: Send(const void * buf,int count,const数据类型和数据类型,int dest,int标签)const

buf: Initial address of send buffer (choice). buf:发送缓冲区的初始地址(选择)。

count: Number of elements send (nonnegative integer). count:发送的元素数(负整数)。

datatype: Datatype of each send buffer element (handle). 数据类型:每个发送缓冲区元素(句柄)的数据类型。

dest: Rank of destination (integer). dest:目的等级(整数)。

tag: Message tag (integer). 标签:消息标签(整数)。

comm: Communicator (handle). comm:通信器(句柄)。

Note that the count parameter is the number elements as defined by the third argument (not some type-independent size measure). 请注意,count参数是由第三个参数定义的数字元素 (不是某些与类型无关的大小度量)。 Apparently, you originally either had 20 elements in your arrays or you thought that the count parameter denotes the size of the send buffer in 4-byte blocks. 显然,您最初在数组中有20个元素,或者您认为count参数以4个字节的块表示发送缓冲区的大小。 Anyway, your MPI command refers to 20 elements, for example MPI::COMM_WORLD.Send(&send_n[0],20,MPI::DOUBLE,nId,ntag); 无论如何,您的MPI命令引用了20个元素,例如MPI::COMM_WORLD.Send(&send_n[0],20,MPI::DOUBLE,nId,ntag); . Therefore, the Recv commands write data that was sent from beyond the end of an array to a location beyond the end of the target array! 因此, Recv命令Recv从数组末尾发送的数据写入到目标数组末尾之外的位置! You were just lucky not to see a segfault (and lucky again because the compiler put your arrays next to each other so you actually saw the effect of writing outside the array). 您只是幸运地没有看到段错误(再次幸运的是,由于编译器将您的数组彼此相邻,因此您实际上看到了在数组之外进行写入的效果)。 I just compiled your program, replacing all those 20 s with 10 s and it runs fine (I am not reposting the code because it's just such a simple change). 我只是编译了您的程序,用10 s替换了所有的20 s,它运行良好(我不会重新发布代码,因为这只是一个简单的更改)。

Output after the change: 更改后的输出:

MPI Initialized
Proc 0 sending North to 2
South Recv ID=0
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer0=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer1=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Buffer=5
Test Run Exiting Normally

To avoid those mistakes in the future, couple the sizes of your arrays with the count you pass to the Send command, eg, through some compile time constant or preprocessor macro. 为了避免将来发生这些错误,请将数组的大小与传递给Send命令的count ,例如,通过一些编译时间常数或预处理器宏。 Since you are using C++ anyways, why not use std::vector<double> instead of double arrays, whose size you can determine at run time before you send them, no matter how they were constructed? 由于无论如何都在使用C ++,为什么不使用std::vector<double>而不是double数组,无论它们如何构造,都可以在运行时确定它们的大小,然后再发送它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM