简体   繁体   English

MPI_Recv和超时

[英]MPI_Recv and timeout

i have a question . 我有个问题 。 Assume i have np processes.For each process, I calculate based on an input file how many messages i need to send to every other process (from 0 to...) and i want to send them this number. 假设我有np个进程,对于每个进程,我都会根据输入文件计算需要发送给其他每个进程的消息数量(从0到...),我想向其发送此编号。 The thing is i can only send from a topology i created through direct connected nodes. 问题是我只能从通过直接连接的节点创建的拓扑中发送。 So basically i want each process to send to all others an int, i have the following algorithm (will use pseudocode): 所以基本上我希望每个进程将一个int发送给所有其他进程,我有以下算法(将使用伪代码):

for(i=1,np){
    if(i!=rankID){
        MPI_Send(&nr,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM_WORLD);
        MPI_SEND(&i,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM,WORLD); //i send the destination along with the int 
    }
}
while(1){
    MPI_Recv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
    MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
    if(destination == rankID){
        ireceive+=recvInt;
        receivedFrom++;
        //normally i would break if i received all np-1 messages but what if someone sends a message through me for another process ?
    }
    else{
        MPI_Send(&recvInt,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
        MPI_Send(&destination,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
    }

}

Now to explain this a bit more.At the end of this little algorithm i want each of my processes to know how many messages they will receive in the next step. 现在进一步解释一下。在这个小算法的结尾,我希望每个进程都知道下一步将收到多少消息。

To send this messages from each node to each node i use a previous routing table i created.Basically each node has a matrix with all the nodes, and topology[node][1] = next hop(thats why i typed nexthop above in the code) . 为了将消息从每个节点发送到每个节点,我使用了我创建的上一个路由表。基本上每个节点都有一个包含所有节点的矩阵,并且topology [node] [1] =下一跳(这就是为什么我在上面输入了nexthop的原因)代码)。

Each node knows that there are np processes so each node will have to receive np-1 messages(where he is the destination). 每个节点都知道有np个进程,因此每个节点将必须接收np-1条消息(他是目的地)。

The problem i am having is that after i receive the np-1 messages i can't break because I may be a next_hop for other process and the message will not be sent. 我遇到的问题是,在收到np-1消息后,我无法中断,因为我可能是其他进程的next_hop,并且该消息将不会发送。 So i want to do something like this, Use an MPI_TEST or another instruction to see if my Recv is actually receiving something, or if it's just sitting there because if the program blocks for 1-2 seconds it is clear that its not going to receive any more(since i don't have a big topology 20-30 processes maximum). 所以我想做这样的事情,使用MPI_TEST或另一条指令来查看我的Recv是否正在实际接收东西,或者它只是坐在那里,因为如果程序阻塞1-2秒,很明显它不会接收任何更多(因为我没有一个大的拓扑,最多20-30个进程)。

The problem is i never worked with MPI_Test or other syntaxes and im not sure how to do this.Can someone help me create a timeout for a Recv or if there is another solution ? 问题是我从来没有使用过MPI_Test或其他语法,而且我不确定该怎么做。有人可以帮我为Recv创建超时,或者是否有其他解决方案? Thank you, sorry for long wall of text 谢谢你,很抱歉长文本

Probably not the most efficient piece of code, but it should work (i did not have a chance to test it) 可能不是最高效的代码,但是它应该可以工作(我没有机会对其进行测试)

MPI_Request request;
MPI_Status status;
for(i=1,np){
    if(i!=rankID){
        MPI_ISend(&nr,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM_WORLD);
        MPI_ISend(&i,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM,WORLD); //i send the destination along with the int 
    }
}
while(1){
    bool over = false;
    if(over == true)
        break;
    if(recievedFrom < np){
        MPI_Recv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
        MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
        if(destination == rankID){
            ireceive+=recvInt;
            receivedFrom++;
            //normally i would break if i received all np-1 messages but what if someone sends a message through me for another process ?
        }
        else{
            MPI_Send(&recvInt,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
            MPI_Send(&destination,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
        }
    }
    else {
        MPI_Irecv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD, request); // non blocking recieve call after you finished receiving everything addressed to you
        time_t now = time(NULL);
        while(time(NULL) < now + time_you_set_until_timeout){
            over = true;
            int flag = 0;
            MPI_Test(req, flag, status);
            if(flag){
                over = false;
                break; //exit timeout loop if something was received
            }
        }
    }
    if(!over){
            MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
            //route the message and continue
    }
}

Anyway, since you don't know how much time can pass until a message works its way through your topology you should be careful with the time you choose for timeout. 无论如何,由于您不知道消息在拓扑中通过之前会经过多少时间,因此应谨慎选择超时时间。 You could try to implement some other kind of signaling mechanism, like broadcasting a message that tells the node received all messages addressed to it. 您可以尝试实现某种其他类型的信令机制,例如广播一条消息,告知该节点已收到发给该节点的所有消息。 Granted it will increase the number of messages sent but it will make sure that everyone got everything. 可以增加发送的消息数量,但可以确保每个人都能得到所有信息。 Also you could try packing or serializing your data to be sent so you have only one Send/Recv call, which would make your code easier to work with (in my opinion). 另外,您可以尝试打包或序列化要发送的数据,以便只有一个Send / Recv调用,这将使您的代码更易于使用(我认为)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM