The code and objectives
I have a fortran
mpi
code called elast3d_mpi.f
to be compiled in both windows and linux systems.
The expected behavior
The compilation in linux is performed as
mpif90 -o elast3d_mpi elast3d_mpi.f
Then the program can be executed in parallel with mpirun
command
mpirun -n 2 elast3d_mpi
The terminal output allows to observe that 2 processors are running it, as is expected
There are 2 processors running this job.
Rank# 0 d1 = 1 d2 = 64
Rank# 1 d1 = 65 d2 = 128
...
If the program is running without mpirun
in linux , then it works without errors and without paralleling processing.
The problem
In order to compile it in windows, the cygwin
environment is used. After installation of these packages
Package Version Status
_autorebase 001007-1 OK
alternatives 1.3.30c-10 OK
base-cygwin 3.8-1 OK
base-files 4.3-2 OK
bash 4.4.12-3 OK
binutils 2.29-1 OK
bzip2 1.0.8-1 OK
ca-certificates 2.32-1 OK
coreutils 8.26-2 OK
crypto-policies 20190218-1 OK
cygutils 1.4.16-2 OK
cygwin 3.0.7-1 OK
cygwin-debuginfo 3.0.7-1 OK
cygwin-devel 3.0.7-1 OK
dash 0.5.9.1-1 OK
diffutils 3.5-2 OK
editrights 1.03-1 OK
file 5.32-1 OK
findutils 4.6.0-1 OK
gawk 5.0.1-1 OK
gcc-core 7.4.0-1 OK
gcc-fortran 7.4.0-1 OK
getent 2.18.90-4 OK
grep 3.0-2 OK
groff 1.22.4-1 OK
gzip 1.8-1 OK
hostname 3.13-1 OK
info 6.7-1 OK
ipc-utils 1.0-2 OK
less 530-1 OK
libargp 20110921-3 OK
libatomic1 7.4.0-1 OK
libattr1 2.4.48-2 OK
libblkid1 2.33.1-1 OK
libbz2_1 1.0.8-1 OK
libcrypt0 2.1-1 OK
libfdisk1 2.33.1-1 OK
libffi6 3.2.1-2 OK
libgc1 8.0.4-1 OK
libgcc1 7.4.0-1 OK
libgdbm4 1.13-1 OK
libgfortran3 6.4.0-5 OK
libgfortran4 7.4.0-1 OK
libgmp10 6.1.2-1 OK
libgomp1 7.4.0-1 OK
libguile17 1.8.8-3 OK
libguile2.0_22 2.0.14-3 OK
libiconv 1.14-3 OK
libiconv2 1.14-3 OK
libintl8 0.19.8.1-2 OK
libisl15 0.16.1-1 OK
libltdl7 2.4.6-7 OK
liblzma5 5.2.4-1 OK
libmpc3 1.1.0-1 OK
libmpfr6 4.0.2-1 OK
libncursesw10 6.1-1.20190727 OK
libopenmpi-devel 3.1.3-1 OK
libopenmpi12 1.10.7-1 OK
libopenmpi40 3.1.3-1 OK
libopenmpicxx1 1.10.4-1 OK
libopenmpifh12 1.10.7-1 OK
libopenmpifh40 3.1.3-1 OK
libopenmpiusef08_40 3.1.3-1 OK
libopenmpiusetkr40 3.1.3-1 OK
libp11-kit0 0.23.15-1 OK
libpcre1 8.43-1 OK
libpipeline1 1.5.1-1 OK
libpkgconf3 1.6.0-1 OK
libpopt-common 1.16-2 OK
libpopt0 1.16-2 OK
libquadmath0 7.4.0-1 OK
libreadline7 7.0.3-3 OK
libsigsegv2 2.10-2 OK
libsmartcols1 2.33.1-1 OK
libssl1.1 1.1.1d-1 OK
libstdc++6 7.4.0-1 OK
libtasn1_6 4.14-1 OK
libunistring2 0.9.10-1 OK
libuuid1 2.33.1-1 OK
login 1.13-1 OK
make 4.2.1-2 OK
man-db 2.7.6.1-1 OK
mintty 3.0.6-1 OK
ncurses 6.1-1.20190727 OK
openmpi 3.1.3-1 OK
openmpi-debuginfo 3.1.1-2 OK
openssl 1.1.1d-1 OK
p11-kit 0.23.15-1 OK
p11-kit-trust 0.23.15-1 OK
pkg-config 1.6.0-1 OK
pkgconf 1.6.0-1 OK
rebase 4.4.4-1 OK
run 1.3.4-2 OK
sed 4.4-1 OK
tar 1.29-1 OK
terminfo 6.1-1.20190727 OK
terminfo-extra 6.1-1.20190727 OK
tzcode 2019c-1 OK
tzdata 2019c-1 OK
util-linux 2.33.1-1 OK
vim-minimal 8.1.1772-1 OK
w32api-headers 5.0.4-1 OK
w32api-runtime 5.0.4-1 OK
which 2.20-2 OK
windows-default-manifest 6.4-1 OK
xz 5.2.4-1 OK
zlib0 1.2.11-1 OK
In windows (7), the program is compiled in a similar way but using a cygwin
terminal
mpif90 -o elast_3d_mpi.exe elast3d_mpi.f
1 - when I try to run it using mpirun
in the cygwin
terminal, I have the follow error
$ mpirun -n 2 elast3d_mpi.exe
-----------------------------------------------------------------
Sorry! You were supposed to get help about:
agent-not-found
from the file:
help-plm-rsh.txt
But I couldn't find that topic in the file. Sorry!
-----------------------------------------------------------------
[gauss:00824] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00824] *** Process received signal ***
[gauss:00824] Signal: Segmentation fault (11)
[gauss:00824] Signal code: Address not mapped (23)
[gauss:00824] Failing at address: 0x0
Unable to print stack trace!
[gauss:00824] *** End of error message ***
2 - When I run it using the orterun
implementation of cygwing
in a cmd
terminal, I have this error
C:\Users\io\Documents\elast-mpi>orterun.exe -np 2 elast3d_mpi
------------------------------------------------------------
Sorry! You were supposed to get help about:
agent-not-found
from the file:
help-plm-rsh.txt
But I couldn't find that topic in the file. Sorry!
------------------------------------------------------------------
[gauss:00827] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 -
error /cygd
rive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/
mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00827] *** Process received signal ***
[gauss:00827] Signal: Segmentation fault (11)
[gauss:00827] Signal code: Address not mapped (23)
[gauss:00827] Failing at address: 0x0
Unable to print stack trace!
[gauss:00827] *** End of error message ***
1 [main] orterun 827 cygwin_exception::open_stackdumpfile:
Dumping stack t
race to orterun.exe.stackdump
3 - Running the program in windows without ortermpi.exe
the program outputs the next error
C:\Users\io\Documents\elast-mpi>elast3d_mpi
---------------------------------------------------------------------
Sorry! You were supposed to get help about:
agent-not-found
from the file:
help-plm-rsh.txt
But I couldn't find that topic in the file. Sorry!
---------------------------------------------------------------------
[gauss:00833] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error /cygd
rive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/
mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00833] Process received signal
[gauss:00833] Signal: Segmentation fault (11)
[gauss:00833] Signal code: Address not mapped (23)
[gauss:00833] Failing at address: 0x0
Unable to print stack trace!
[gauss:00833] End of error message
[gauss:00832] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on th
e local node in file /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_
64/src/openmpi-3.1.3/orte/mca/ess/singleton/ess_singleton_module.c at line 532
[gauss:00832] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on th
e local node in file /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_
64/src/openmpi-3.1.3/orte/mca/ess/singleton/ess_singleton_module.c at line 166
--------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value Unable to start a daemon on the local node (-127) instead o
f ORTE_SUCCESS
---------------------------------------------------------------------
---------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------
An error occurred in MPI_Init
on a NULL communicator
MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job)
[gauss:00832] Local abort before MPI_INIT completed completed successfully, but
am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Observations and questions
mpirun
in linux , then it works without errors and without paralleling processing.ortermpi.exe
the program outputs errors.As pointed by @payam_sbr the installation of the openssh
package allows to run the program in parallel in windows .
Test
After installation of the openssh
package the program can be run in windows:
1 - Using the cygwin terminal
mpirun -n 2 ./elast3d_mpi.exe
2 - Or by using the cmd terminal
orterun -np 2 ./elast3d_mpi.exe
In both cases the result is the same like in linux
There are 2 processors running this job.
Rank# 0 d1 = 1 d2 = 64
Rank# 1 d1 = 65 d2 = 128
...
Observation
The installation of the openssh solved the problem for the cases 1) and 2), so the problem (run the code in parallel in windows with cygwin) is solved.
In the other hand, the installation of all cygwing packages to run the program in other machines is not very practical in terms of the distribution of the program.
Is there a way to compile all my code need in order to run in windows without the manual installation of the cygwin package in each machine where the code will to run? What are the options?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.