简体   繁体   English

MPI:具有MPI_ANY_SOURCE的MPI_recv无法从某些进程接收消息

[英]MPI: MPI_recv with MPI_ANY_SOURCE can not receive messages from some processes

I want to implement a system, where there are one receiver and multiple senders. 我想实现一个系统,其中有一个接收者和多个发送者。 Each sender keeps sending data to the receiver. 每个发送方都不断向接收方发送数据。 The receiver waits to receive data and process it. 接收器等待接收数据并对其进行处理。 Here is the toy example. 这是玩具示例。

#include <iostream>
#include <cstdlib>
#include <mpi.h>
using namespace std;

int main(int argc, char *argv[]) {
    int _mpi_numWorkers, _mpi_rank;

    // Initialize openMPI
    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &_mpi_numWorkers);
    MPI_Comm_rank(MPI_COMM_WORLD, &_mpi_rank);

    MPI_Barrier(MPI_COMM_WORLD);

    float *send_data = (float *)malloc(360 * 5000 * sizeof(float));
    MPI_Status receive_status;

    if (_mpi_rank == 0) {
        while (1) {
            MPI_Recv(send_data, 360 * 5000, MPI_FLOAT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &receive_status);

            cout << "Receive from " << receive_status.MPI_SOURCE << endl;
        }
    } else {

        while (1){
            MPI_Send(send_data, 360 * 5000, MPI_FLOAT, 0, 0, MPI_COMM_WORLD);

            //sleep(1);
        }
    }

    // Terminate
    MPI_Finalize();
    return 0;
}

The issue is that MPI_recv can only receive messages from up to two processes no matter how many processes I set it to run (when no sleep). 问题是,无论我将其设置为运行多少个(无睡眠时),MPI_recv最多只能从两个进程接收消息。 I have tested this code on one single machine, and on multiple machines cases: 我已经在一台机器上以及多台机器的情况下测试了此代码:

Single Machine Case 单机壳

I run this code through the following command: 我通过以下命令运行此代码:

mpiexec -n 5 ./test_mpi

Then, the receiver only receive from senders with rank 1 and 2. 然后,接收方仅从等级1和2的发送方接收。

Multiple Machine Case 多机壳

I run 4 senders and 1 receiver on 5 homogeneous physical machines. 我在5台同类物理计算机上运行4个发送方和1个接收方。 All of them connect to a 100Mbps switch. 它们都连接到100Mbps交换机。 In this case, the receiver also only receives data from a subset of senders. 在这种情况下,接收方也仅从发送方的子集接收数据。 I use the tcpdump to check the packet, and observe that some senders do not even send the message. 我使用tcpdump检查数据包,发现有些发件人甚至没有发送消息。 (Those senders are blocked at MPI_send, but no tcp sequence increases and no re-transmission.) (这些发送方在MPI_send处被阻止,但是tcp序列没有增加,也没有重新传输。)

For those two cases, if I make each sender sleep some time (decreasing the sending rate), the receiver can receive data from more senders. 对于这两种情况,如果我让每个发送方都休眠一些时间(降低发送速率),则接收方可以从更多发送方接收数据。

Can somebody help me to understand why this happens? 有人可以帮助我了解为什么会这样吗?

Environment 环境

Debian testing with openmpi-1.6 使用openmpi-1.6进行Debian测试

Edit 2/4/16 编辑2/4/16

I include <cstdlib> in the code to prevent any compilation issues. 我在代码中包含<cstdlib>以防止任何编译问题。

MPI has no fairness guarantee in that respect, see eg MPI在这方面没有公平保证,请参见例如

http://www.mcs.anl.gov/research/projects/mpi/tutorial/gropp/node92.html#Node92 http://www.mcs.anl.gov/research/projects/mpi/tutorial/gropp/node92.html#Node92

That means what you see is perfectly "legal" from MPI's point of view. 从MPI的角度来看,这意味着您所看到的完全是“合法的”。 One page further in the link I gave, there's a snippet that supposedly helps with that issue. 在我提供的链接的后一页,有一个片段据说可以解决该问题。 In short, you have to issue (asynchronous) receives for each possible sender manually, and then handle them in a manner that look "fair" to you. 简而言之,您必须手动为每个可能的发件人发出(异步)接收,然后以对您看起来“公平”的方式处理它们。

http://www.mcs.anl.gov/research/projects/mpi/tutorial/gropp/node93.html#Node93 http://www.mcs.anl.gov/research/projects/mpi/tutorial/gropp/node93.html#Node93

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM