简体   繁体   中英

OpenMPI / High-Performance Linpack Newbie Question

I have a small cluster of 4 nodes, each with 4 cores. I can happily run HP Linpack on one node, but I'm struggling to get it to run on multiple nodes.

I compiled HPL-2.3 from source with OpenMPI and OpenBLAS. All seems to work well with single node tests.

My 'nodes' file is:

192.168.0.1 slots=4
192.168.0.2 slots=4
192.168.0.3 slots=4
192.168.0.4 slots=4

If I run mpirun -np 16 -hostfile nodes uptime I get the following:

19:10:49 up  8:46,  1 user,  load average: 0.05, 0.53, 0.34
19:10:49 up  8:46,  1 user,  load average: 0.05, 0.53, 0.34
19:10:49 up  8:46,  1 user,  load average: 0.05, 0.53, 0.34
19:10:49 up 9 min,  0 users,  load average: 0.08, 0.06, 0.03
19:10:49 up 9 min,  0 users,  load average: 0.08, 0.06, 0.03
19:10:49 up 9 min,  0 users,  load average: 0.08, 0.06, 0.03
19:10:49 up  8:46,  1 user,  load average: 0.05, 0.53, 0.34
19:10:49 up 37 min,  0 users,  load average: 0.08, 0.02, 0.01
19:10:49 up 37 min,  0 users,  load average: 0.08, 0.02, 0.01
19:10:49 up 37 min,  0 users,  load average: 0.08, 0.02, 0.01
19:10:49 up 20 min,  0 users,  load average: 0.00, 0.02, 0.00
19:10:49 up 9 min,  0 users,  load average: 0.08, 0.06, 0.03
19:10:49 up 20 min,  0 users,  load average: 0.00, 0.02, 0.00
19:10:49 up 20 min,  0 users,  load average: 0.00, 0.02, 0.00
19:10:49 up 37 min,  0 users,  load average: 0.08, 0.02, 0.01
19:10:49 up 20 min,  0 users,  load average: 0.00, 0.02, 0.00

which, suggest to me, that OpenMPI is working and distributing uptime to 4 processor, 16 cores.

However, when I run mpirun -np 16 -hostfile nodes xhpl I get the following:

mpirun was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 8; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
      line parameter option (remember that mpirun interprets the first
      unrecognized command line token as the executable).

Node:       192.168.0.3
Executable: /home/ucapjbj/phas0077/projects/hpl-2.3/bin/arch/xhpl

This suggest to me that xhpl cannot be found on node 192.168.0.3 , which seems reasonable, since it is only present on 192.168.0.1 , which is my development node. But conceptually, I was under the impression I could develop on one node, and then have OpenMPI distribute the executable to the other nodes for execution without having to copy the executable to the other nodes beforehand. Have I fundamentally misunderstood this?

Any guidance would be much appreciated.

Kind regards

John

It appears I have to copy the 'xhpl' executable to the same location on each node.

I've looked at the mpirun --preload-binary option, which would appear to be exactly what I want, but I can't get this to work. Any advice would be very welcome.

Best wishes

John

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM