简体   繁体   English

在MPI_Send / MPI_Recv对中,如果数据未正确同步,是否会丢失数据?

[英]In MPI_Send / MPI_Recv pairs, can data be lost if it isn't synchronised correctly?

Let me explain. 让我解释。 Consider 4 slave nodes 1, 2, 3, 4 and a master node 0. Now, 1, 2, 3, 4, need to send data to 0. 0 receives this data in the following format. 考虑4个从节点1、2、3、4和一个主节点0。现在,1、2、3、4需要将数据发送到0。0以以下格式接收此数据。

for(int proc = 1;proc<procCount;proc++) // for each processor cpu (procCount = 5)
{
    for(int p = 0;p<50;p++)
    {

    std::cout<<proc<<"\tA\t"<<p<<std::endl;

    // read in binary datas
   int chunkP;
   int realP;
   real fitnessVal;
   real fitnessValB;
   real fitnessValC;
   int conCount;
   real subConCount;
   real networkEnergyLoss;
   real movementEnergyLoss;
   long spikeCount;

   MPI_Recv (reinterpret_cast < char *>(&chunkP),
      sizeof (chunkP),
                     MPI_CHAR,proc,MPI_ANY_TAG,MPI_COMM_WORLD,&stat);
   MPI_Recv (reinterpret_cast < char *>(&realP),
      sizeof (realP),
                        .
                        .
                        .
           }
     }

Clearly, the order in which 1, 2, 3 and 4 send the data to 0 cannot be assumed (since they are all operating independently of each other -- 2 might send data before 1). 显然,不能假设1、2、3和4将数据发送到0的顺序(因为它们都彼此独立运行-2可能在1之前发送数据)。 So assuming 2 does send its data before 1 (for example), the receiving loop in 0 shown above won't initiate until the source tag 'proc' in the MPI_Recv command is matched to the processor '1' because the outer for loop forces this ordering. 因此,假设2确实在1之前发送数据(例如),则直到MPI_Recv命令中的源标签proc与处理器1匹配之前,上述0中的接收循环才会启动,因为外部for循环强制这个命令。

So what happens is the loop 'waits' until there is data incoming from 1 before it can do anything else even if there is already data arriving from 2, 3 and 4. What happens to this data arriving from 2,3 and 4 if it arrives before 1? 因此,发生的事情是循环“等待”直到有数据从1传入,然后它才能执行其他任何操作,即使已经有来自2、3和4的数据到达了也是如此。在1之前到达? Can it be 'forgotten' in the sense that once data from '1' does start arriving and then proc increments to 2, the data that it originally tried to receive from 2 is simply not there any more? 从“ 1”开始数据到达然后proc递增到2之后,它最初试图从2接收的数据就不再存在了,就可以说它被“遗忘”了吗? If it is 'forgotten', the whole distributed simulation will just hang, because it never ends up being able to process the data of a particular slave process correctly. 如果它被“遗忘”,则整个分布式仿真将被挂起,因为它永远无法正确处理特定从属进程的数据。

Thanks, Ben. 谢谢,本

Firstly, do you really mean to receive an MPI_CHAR into chunkP - an int - shouldn't you receive an MPI_INT ? 首先,你真的要收到MPI_CHAR到chunkP -一个int -你不应该收到MPI_INT

The messages from ranks 1:4 will not get lost - they will get queued until rank 0 chooses to receive them. 等级1:4的消息不会丢失-直到等级0选择接收它们时,它们才会排队。 This behaviour is mandated by the MPI standard. MPI标准规定了此行为。

If the messages are large enough, ranks 1:4 may block until they can actually send their messages to rank 0 (most MPI implementations have limited buffering). 如果消息足够大,则等级1:4可能会阻塞,直到它们可以将其消息实际发送到等级0(大多数MPI实现具有有限的缓冲)为止。

You might also consider having rank 0 do an MPI_ANY_SOURCE receive for the first receive to see who's ready to send. 您可能还考虑让等级0做MPI_ANY_SOURCE接收的第一个MPI_ANY_SOURCE ,以查看谁准备发送。 You'll need to take care though to ensure that subsequent receives are posted for the corresponding source - look in the MPI_Status struct to see where the message was actually sent from. 不过,您需要注意确保为相应的源发布了后续的接收-查看MPI_Status结构以查看实际发送消息的位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM