简体   繁体   English

如何从已在 SLURM 上使用 srun 启动的 fortran 代码中运行并行程序?

[英]How to run a parallel program from within a fortran code already launched with srun on SLURM?

I think my question is pretty specific and niche, and couldn't find an answer anywhere else.我认为我的问题非常具体和利基,在其他任何地方都找不到答案。

I have a parallel code in Fortran (using MPI), and I would like a subroutine on each individual processor to call another (in principle serial) program during runtime.我在 Fortran 中有一个并行代码(使用 MPI),我希望每个单独的处理器上都有一个子例程,以便在运行时调用另一个(原则上是串行的)程序。 I do this with EXECUTE_COMMAND_LINE .我用EXECUTE_COMMAND_LINE做到这一点。 Now it turns out the other code I'm calling is also parallelized, with no possibility of producing a purely serial version without MPI.现在事实证明,我调用的其他代码也是并行化的,不可能在没有 MPI 的情况下生成纯串行版本。 In my SLURM file, the cluster is set up such that I have to use srun, so srun./mycode < input.in > output.out calls my code.在我的 SLURM 文件中,集群设置为我必须使用 srun,因此srun./mycode < input.in > output.out调用我的代码。 In the 3rd party code, however, the easiest way to specify the number of cores is to use the provided launcher, which itself uses mpirun to launch the right number of nodes.然而,在第 3 方代码中,指定核心数量的最简单方法是使用提供的启动器,它本身使用 mpirun 来启动正确数量的节点。

In principle, it is possible to run the 3rd party code without mpirun, in which case it should launch a "serial" version (parallel version but on a single core).原则上,可以在没有 mpirun 的情况下运行 3rd 方代码,在这种情况下,它应该启动“串行”版本(并行版本,但在单核上)。 However, as my code is already being run with srun, it looks like this is triggering the parallel version of the 3rd party software to run on multiple processors, which is ruining what I'm trying to do with this.但是,由于我的代码已经在使用 srun 运行,看起来这会触发 3rd 方软件的并行版本在多个处理器上运行,这破坏了我正在尝试做的事情。 If I use the normal launcher that calls mpirun to invoke the 3rd party code, everything hangs because mpirun is waiting for the first instance of srun to complete, which it never will.如果我使用调用 mpirun 的普通启动器来调用 3rd 方代码,那么一切都会挂起,因为 mpirun 正在等待 srun 的第一个实例完成,而它永远不会完成。

Is there any way I can specify to the 3rd party code (that doesn't have a flag to specify this explicitly without invoking mpirun) to run on a single processor?有什么方法可以指定第 3 方代码(没有标志来显式指定而不调用 mpirun)在单个处理器上运行? Perhaps an environment variable I can set, or a way of using EXECUTE_COMMAND_LINE that would specify the number of cores to run the command on?也许我可以设置一个环境变量,或者一种使用EXECUTE_COMMAND_LINE的方式来指定运行命令的核心数量? Or even a way to make multiple mpirun commands interact with preventing each other from running?甚至是一种让多个 mpirun 命令交互以防止彼此运行的方法?

I use Intel compilers and MPI versions for everything.我对所有东西都使用英特尔编译器和 MPI 版本。

A colleague found one way to do this for anyone struggling:一位同事为任何苦苦挣扎的人找到了一种方法:

call execute_command_line("bash -lc 'env -i PATH=/usr/bin/:/bin mpirun -n 2 ./bin/slave &> slave.out &'", wait=.false.)

Executed from within the calling fortran code.从调用 fortran 代码中执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM