简体   繁体   中英

MPI fortran code compiled in linux and windows with cygwin

The code and objectives

I have a fortran mpi code called elast3d_mpi.f to be compiled in both windows and linux systems.

The expected behavior

The compilation in linux is performed as

mpif90 -o elast3d_mpi elast3d_mpi.f

Then the program can be executed in parallel with mpirun command

mpirun -n 2 elast3d_mpi

The terminal output allows to observe that 2 processors are running it, as is expected

 There are 2  processors running this job.
 Rank# 0 d1 = 1   d2 = 64
 Rank# 1 d1 = 65  d2 = 128
 ...

If the program is running without mpirun in linux , then it works without errors and without paralleling processing.

The problem

In order to compile it in windows, the cygwin environment is used. After installation of these packages

Package                  Version            Status
_autorebase              001007-1           OK
alternatives             1.3.30c-10         OK
base-cygwin              3.8-1              OK
base-files               4.3-2              OK
bash                     4.4.12-3           OK
binutils                 2.29-1             OK
bzip2                    1.0.8-1            OK
ca-certificates          2.32-1             OK
coreutils                8.26-2             OK
crypto-policies          20190218-1         OK
cygutils                 1.4.16-2           OK
cygwin                   3.0.7-1            OK
cygwin-debuginfo         3.0.7-1            OK
cygwin-devel             3.0.7-1            OK
dash                     0.5.9.1-1          OK
diffutils                3.5-2              OK
editrights               1.03-1             OK
file                     5.32-1             OK
findutils                4.6.0-1            OK
gawk                     5.0.1-1            OK
gcc-core                 7.4.0-1            OK
gcc-fortran              7.4.0-1            OK
getent                   2.18.90-4          OK
grep                     3.0-2              OK
groff                    1.22.4-1           OK
gzip                     1.8-1              OK
hostname                 3.13-1             OK
info                     6.7-1              OK
ipc-utils                1.0-2              OK
less                     530-1              OK
libargp                  20110921-3         OK
libatomic1               7.4.0-1            OK
libattr1                 2.4.48-2           OK
libblkid1                2.33.1-1           OK
libbz2_1                 1.0.8-1            OK
libcrypt0                2.1-1              OK
libfdisk1                2.33.1-1           OK
libffi6                  3.2.1-2            OK
libgc1                   8.0.4-1            OK
libgcc1                  7.4.0-1            OK
libgdbm4                 1.13-1             OK
libgfortran3             6.4.0-5            OK
libgfortran4             7.4.0-1            OK
libgmp10                 6.1.2-1            OK
libgomp1                 7.4.0-1            OK
libguile17               1.8.8-3            OK
libguile2.0_22           2.0.14-3           OK
libiconv                 1.14-3             OK
libiconv2                1.14-3             OK
libintl8                 0.19.8.1-2         OK
libisl15                 0.16.1-1           OK
libltdl7                 2.4.6-7            OK
liblzma5                 5.2.4-1            OK
libmpc3                  1.1.0-1            OK
libmpfr6                 4.0.2-1            OK
libncursesw10            6.1-1.20190727     OK
libopenmpi-devel         3.1.3-1            OK
libopenmpi12             1.10.7-1           OK
libopenmpi40             3.1.3-1            OK
libopenmpicxx1           1.10.4-1           OK
libopenmpifh12           1.10.7-1           OK
libopenmpifh40           3.1.3-1            OK
libopenmpiusef08_40      3.1.3-1            OK
libopenmpiusetkr40       3.1.3-1            OK
libp11-kit0              0.23.15-1          OK
libpcre1                 8.43-1             OK
libpipeline1             1.5.1-1            OK
libpkgconf3              1.6.0-1            OK
libpopt-common           1.16-2             OK
libpopt0                 1.16-2             OK
libquadmath0             7.4.0-1            OK
libreadline7             7.0.3-3            OK
libsigsegv2              2.10-2             OK
libsmartcols1            2.33.1-1           OK
libssl1.1                1.1.1d-1           OK
libstdc++6               7.4.0-1            OK
libtasn1_6               4.14-1             OK
libunistring2            0.9.10-1           OK
libuuid1                 2.33.1-1           OK
login                    1.13-1             OK
make                     4.2.1-2            OK
man-db                   2.7.6.1-1          OK
mintty                   3.0.6-1            OK
ncurses                  6.1-1.20190727     OK
openmpi                  3.1.3-1            OK
openmpi-debuginfo        3.1.1-2            OK
openssl                  1.1.1d-1           OK
p11-kit                  0.23.15-1          OK
p11-kit-trust            0.23.15-1          OK
pkg-config               1.6.0-1            OK
pkgconf                  1.6.0-1            OK
rebase                   4.4.4-1            OK
run                      1.3.4-2            OK
sed                      4.4-1              OK
tar                      1.29-1             OK
terminfo                 6.1-1.20190727     OK
terminfo-extra           6.1-1.20190727     OK
tzcode                   2019c-1            OK
tzdata                   2019c-1            OK
util-linux               2.33.1-1           OK
vim-minimal              8.1.1772-1         OK
w32api-headers           5.0.4-1            OK
w32api-runtime           5.0.4-1            OK
which                    2.20-2             OK
windows-default-manifest 6.4-1              OK
xz                       5.2.4-1            OK
zlib0                    1.2.11-1           OK

In windows (7), the program is compiled in a similar way but using a cygwin terminal

mpif90 -o elast_3d_mpi.exe elast3d_mpi.f

1 - when I try to run it using mpirun in the cygwin terminal, I have the follow error

$ mpirun -n 2 elast3d_mpi.exe
-----------------------------------------------------------------
Sorry!  You were supposed to get help about:
    agent-not-found
from the file:
    help-plm-rsh.txt
But I couldn't find that topic in the file.  Sorry!
-----------------------------------------------------------------
[gauss:00824] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00824] *** Process received signal ***
[gauss:00824] Signal: Segmentation fault (11)
[gauss:00824] Signal code: Address not mapped (23)
[gauss:00824] Failing at address: 0x0
Unable to print stack trace!
[gauss:00824] *** End of error message ***

2 - When I run it using the orterun implementation of cygwing in a cmd terminal, I have this error

C:\Users\io\Documents\elast-mpi>orterun.exe -np 2 elast3d_mpi
------------------------------------------------------------
Sorry!  You were supposed to get help about:
agent-not-found
from the file:
help-plm-rsh.txt
But I couldn't find that topic in the file.  Sorry!
------------------------------------------------------------------
[gauss:00827] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - 
error /cygd
rive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/
mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00827] *** Process received signal ***
[gauss:00827] Signal: Segmentation fault (11)
[gauss:00827] Signal code: Address not mapped (23)
[gauss:00827] Failing at address: 0x0
Unable to print stack trace!
[gauss:00827] *** End of error message ***
1 [main] orterun 827 cygwin_exception::open_stackdumpfile: 
Dumping stack t
race to orterun.exe.stackdump

3 - Running the program in windows without ortermpi.exe the program outputs the next error

C:\Users\io\Documents\elast-mpi>elast3d_mpi
---------------------------------------------------------------------
Sorry!  You were supposed to get help about:
agent-not-found
from the file:
help-plm-rsh.txt
But I couldn't find that topic in the file.  Sorry!
---------------------------------------------------------------------
[gauss:00833] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error /cygd
rive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/
mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00833] Process received signal 
[gauss:00833] Signal: Segmentation fault (11)
[gauss:00833] Signal code: Address not mapped (23)
[gauss:00833] Failing at address: 0x0
Unable to print stack trace!
[gauss:00833] End of error message 
[gauss:00832] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on th
e local node in file /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_
64/src/openmpi-3.1.3/orte/mca/ess/singleton/ess_singleton_module.c at line 532
[gauss:00832] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on th
e local node in file /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_
64/src/openmpi-3.1.3/orte/mca/ess/singleton/ess_singleton_module.c at line 166
--------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value Unable to start a daemon on the local node (-127) instead o
f ORTE_SUCCESS
---------------------------------------------------------------------
---------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------
An error occurred in MPI_Init
on a NULL communicator
MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job)
[gauss:00832] Local abort before MPI_INIT completed completed successfully, but
am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Observations and questions

  • If the program is running without mpirun in linux , then it works without errors and without paralleling processing.
  • Running the program in windows without ortermpi.exe the program outputs errors.
  • It sounds like a running (environmental) problem.
  • Is it the best way to compile this program in windows?
  • Can I compile the same mpi fortran code program in both windows and linux?
  • What can I try to compiling the program in order to do the program running in windows system?

As pointed by @payam_sbr the installation of the openssh package allows to run the program in parallel in windows .

Test

After installation of the openssh package the program can be run in windows:

1 - Using the cygwin terminal

mpirun -n 2 ./elast3d_mpi.exe

2 - Or by using the cmd terminal

orterun -np 2 ./elast3d_mpi.exe

In both cases the result is the same like in linux

There are 2  processors running this job.
 Rank# 0 d1 = 1   d2 = 64
 Rank# 1 d1 = 65  d2 = 128
 ...

Observation

The installation of the openssh solved the problem for the cases 1) and 2), so the problem (run the code in parallel in windows with cygwin) is solved.

In the other hand, the installation of all cygwing packages to run the program in other machines is not very practical in terms of the distribution of the program.

Is there a way to compile all my code need in order to run in windows without the manual installation of the cygwin package in each machine where the code will to run? What are the options?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM