简体   繁体   中英

MPI_Recv and timeout

i have a question . Assume i have np processes.For each process, I calculate based on an input file how many messages i need to send to every other process (from 0 to...) and i want to send them this number. The thing is i can only send from a topology i created through direct connected nodes. So basically i want each process to send to all others an int, i have the following algorithm (will use pseudocode):

for(i=1,np){
    if(i!=rankID){
        MPI_Send(&nr,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM_WORLD);
        MPI_SEND(&i,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM,WORLD); //i send the destination along with the int 
    }
}
while(1){
    MPI_Recv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
    MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
    if(destination == rankID){
        ireceive+=recvInt;
        receivedFrom++;
        //normally i would break if i received all np-1 messages but what if someone sends a message through me for another process ?
    }
    else{
        MPI_Send(&recvInt,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
        MPI_Send(&destination,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
    }

}

Now to explain this a bit more.At the end of this little algorithm i want each of my processes to know how many messages they will receive in the next step.

To send this messages from each node to each node i use a previous routing table i created.Basically each node has a matrix with all the nodes, and topology[node][1] = next hop(thats why i typed nexthop above in the code) .

Each node knows that there are np processes so each node will have to receive np-1 messages(where he is the destination).

The problem i am having is that after i receive the np-1 messages i can't break because I may be a next_hop for other process and the message will not be sent. So i want to do something like this, Use an MPI_TEST or another instruction to see if my Recv is actually receiving something, or if it's just sitting there because if the program blocks for 1-2 seconds it is clear that its not going to receive any more(since i don't have a big topology 20-30 processes maximum).

The problem is i never worked with MPI_Test or other syntaxes and im not sure how to do this.Can someone help me create a timeout for a Recv or if there is another solution ? Thank you, sorry for long wall of text

Probably not the most efficient piece of code, but it should work (i did not have a chance to test it)

MPI_Request request;
MPI_Status status;
for(i=1,np){
    if(i!=rankID){
        MPI_ISend(&nr,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM_WORLD);
        MPI_ISend(&i,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM,WORLD); //i send the destination along with the int 
    }
}
while(1){
    bool over = false;
    if(over == true)
        break;
    if(recievedFrom < np){
        MPI_Recv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
        MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
        if(destination == rankID){
            ireceive+=recvInt;
            receivedFrom++;
            //normally i would break if i received all np-1 messages but what if someone sends a message through me for another process ?
        }
        else{
            MPI_Send(&recvInt,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
            MPI_Send(&destination,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
        }
    }
    else {
        MPI_Irecv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD, request); // non blocking recieve call after you finished receiving everything addressed to you
        time_t now = time(NULL);
        while(time(NULL) < now + time_you_set_until_timeout){
            over = true;
            int flag = 0;
            MPI_Test(req, flag, status);
            if(flag){
                over = false;
                break; //exit timeout loop if something was received
            }
        }
    }
    if(!over){
            MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
            //route the message and continue
    }
}

Anyway, since you don't know how much time can pass until a message works its way through your topology you should be careful with the time you choose for timeout. You could try to implement some other kind of signaling mechanism, like broadcasting a message that tells the node received all messages addressed to it. Granted it will increase the number of messages sent but it will make sure that everyone got everything. Also you could try packing or serializing your data to be sent so you have only one Send/Recv call, which would make your code easier to work with (in my opinion).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM