[英]mpirun is not working with two nodes
I am working in a cluster where each node has 16 processors. 我在每个节点都有16个处理器的群集中工作。 My version of Open MPI is 1.5.3.
我的Open MPI版本是1.5.3。 I have written the following simple code in fortran:
我在fortran中编写了以下简单代码:
program MAIN
implicit none
include 'mpif.h'
integer status(MPI_STATUS_SIZE)
integer ierr,my_rank,size
integer irep, nrep, iex
character*1 task
!Initialize MPI
call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD,my_rank,ierr)
call mpi_comm_size(MPI_COMM_WORLD,size,ierr)
do iex=1,2
if(my_rank.eq.0) then
!Task for the master
nrep = size
do irep=1,nrep-1
task='q'
print *, 'master',iex,task
call mpi_send(task,1,MPI_BYTE,irep,irep+1,
& MPI_COMM_WORLD,ierr)
enddo
else
!Here are the tasks for the slaves
!Receive the task sent by the master node
call mpi_recv(task,1,MPI_BYTE,0,my_rank+1,
& MPI_COMM_WORLD,status,ierr)
print *, 'slaves', my_rank,task
endif
enddo
call mpi_finalize(ierr)
end
then I compile the code with: 然后我用以下代码编译代码:
/usr/lib64/openmpi/bin/mpif77 -o test2 test2.f
and run it with 并运行它
/usr/lib64/openmpi/bin/mpirun -np 32 -hostfile nodefile test2
my nodefile looks like this: 我的nodefile看起来像这样:
node1
node1
...
node2
node2
...
with node1 and node2 repeated 16 times each. node1和node2分别重复16次。
I can compile successfully. 我可以编译成功。 When I run it for -np 16 (so just one node) it works fine: each slave finishes its task and I get the prompt back in the terminal.
当我在-np 16上运行它(所以只有一个节点)时,它运行良好:每个从站都完成了任务,并且在终端中又得到了提示。 But when I try -np 32, not all the slaves finish their work, only 16 of them.
但是当我尝试-np 32时,并不是所有的奴隶都完成工作,只有16个奴隶完成了。
Actually with 32 nodes the program doesn't give me the prompt back, so that I think the program is stacked somewhere and is waiting for some task to be perform. 实际上,在32个节点的情况下,该程序并没有提示我,因此我认为程序堆积在某个地方,正在等待执行某些任务。
I would like to receive any comment from you as far as I have spent some time in this trivial problem. 就我在这个微不足道的问题上花了一些时间,我想收到您的任何评论。
Thanks. 谢谢。
您是否尝试使用mpiexec而不是mpirun?
I'm not sure that your nodefile is correct. 我不确定您的nodefile是否正确。 I'd expect to see lines like this:
我希望看到这样的行:
node1 slots=16
OpenMPI is pretty well-documented, have you checked out their FAQ ? OpenMPI的文档非常齐全,您是否已查看了他们的常见问题解答?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.