简体   繁体   English

mpirun不适用于两个节点

[英]mpirun is not working with two nodes

I am working in a cluster where each node has 16 processors. 我在每个节点都有16个处理器的群集中工作。 My version of Open MPI is 1.5.3. 我的Open MPI版本是1.5.3。 I have written the following simple code in fortran: 我在fortran中编写了以下简单代码:

  program MAIN
  implicit none
  include 'mpif.h'
  integer status(MPI_STATUS_SIZE)
  integer ierr,my_rank,size


  integer irep, nrep, iex
  character*1 task


  !Initialize MPI
  call mpi_init(ierr)
  call mpi_comm_rank(MPI_COMM_WORLD,my_rank,ierr)
  call mpi_comm_size(MPI_COMM_WORLD,size,ierr)


  do iex=1,2

     if(my_rank.eq.0) then
        !Task for the master
        nrep = size

        do irep=1,nrep-1
          task='q'
          print *, 'master',iex,task
          call mpi_send(task,1,MPI_BYTE,irep,irep+1,
 &                     MPI_COMM_WORLD,ierr)
        enddo


     else
        !Here are the tasks for the slaves


        !Receive the task sent by the master node
        call mpi_recv(task,1,MPI_BYTE,0,my_rank+1,
 &                   MPI_COMM_WORLD,status,ierr)


        print *, 'slaves', my_rank,task


     endif

  enddo


  call mpi_finalize(ierr)

  end

then I compile the code with: 然后我用以下代码编译代码:

/usr/lib64/openmpi/bin/mpif77 -o test2 test2.f

and run it with 并运行它

/usr/lib64/openmpi/bin/mpirun  -np 32 -hostfile nodefile test2

my nodefile looks like this: 我的nodefile看起来像这样:

node1
node1
...
node2
node2
... 

with node1 and node2 repeated 16 times each. node1和node2分别重复16次。

I can compile successfully. 我可以编译成功。 When I run it for -np 16 (so just one node) it works fine: each slave finishes its task and I get the prompt back in the terminal. 当我在-np 16上运行它(所以只有一个节点)时,它运行良好:每个从站都完成了任务,并且在终端中又得到了提示。 But when I try -np 32, not all the slaves finish their work, only 16 of them. 但是当我尝试-np 32时,并不是所有的奴隶都完成工作,只有16个奴隶完成了。

Actually with 32 nodes the program doesn't give me the prompt back, so that I think the program is stacked somewhere and is waiting for some task to be perform. 实际上,在32个节点的情况下,该程序并没有提示我,因此我认为程序堆积在某个地方,正在等待执行某些任务。

I would like to receive any comment from you as far as I have spent some time in this trivial problem. 就我在这个微不足道的问题上花了一些时间,我想收到您的任何评论。

Thanks. 谢谢。

您是否尝试使用mpiexec而不是mpirun?

I'm not sure that your nodefile is correct. 我不确定您的nodefile是否正确。 I'd expect to see lines like this: 我希望看到这样的行:

node1 slots=16

OpenMPI is pretty well-documented, have you checked out their FAQ ? OpenMPI的文档非常齐全,您是否已查看了他们的常见问题解答?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM