When using mpirun
, is it possible to catch signals (for example, the SIGINT generated by ^C
) in the code being run?
For example, I'm running a parallelized python code. I can except KeyboardInterrupt
to catch those errors when running python blah.py
by itself, but I can't when doing mpirun -np 1 python blah.py
.
Does anyone have a suggestion? Even finding how to catch signals in a C or C++ compiled program would be a helpful start.
If I send a signal to the spawned Python processes, they can handle the signals properly; however, signals sent to the parent orterun
process (ie from exceeding wall time on a cluster, or pressing control-C in a terminal) will kill everything immediately.
I think it is really implementation dependent.
In SLURM, I tried to use sbatch --signal USR1@30
to send SIGUSR1
(whose signum is 30,10 or 16) to the program launched by srun
commands. And the process received signal SIGUSR1 = 10
.
For platform MPI of IBM, according to https://www.ibm.com/support/knowledgecenter/en/SSF4ZA_9.1.4/pmpi_guide/signal_propagation.html
SIGINT, SIGUSR1, SIGUSR2
will be bypassed to processes.
In MPICH, SIGUSR1 is used by the process manager for internal notification of abnormal failures. ref: http://lists.mpich.org/pipermail/discuss/2014-October/003242.html >
Open MPI on the other had will forward SIGUSR1 and SIGUSR2 from mpiexec to the other processes. ref: http://www.open-mpi.org/doc/v1.6/man1/mpirun.1.php#sect14 >
For IntelMPI, according to https://software.intel.com/en-us/mpi-developer-reference-linux-hydra-environment-variables
I_MPI_JOB_SIGNAL_PROPAGATION
and I_MPI_JOB_TIMEOUT_SIGNAL
can be set to send signal.
Another thing worth notice: For many python scripts, they will invoke other library or codes through cython, and if the SIGUSR1
is caught by the sub-process, something unwanted might happen.
If you use mpirun --nw
, then mpirun
itself should terminate as soon as it's started the subprocesses, instead of waiting for their termination; if that's acceptable then I believe your processes would be able to catch their own signals.
The signal module supports setting signal handlers using signal.signal
:
Set the handler for signal signalnum to the function handler. handler can be a callable Python object taking two arguments (see below), or one of the special values signal.SIG_IGN or signal.SIG_DFL. The previous signal handler will be returned...
import signal
def ignore(sig, stack):
print "I'm ignoring signal %d" % (sig, )
signal.signal(signal.SIGINT, ignore)
while True: pass
If you send a SIGINT
to a Python interpreter running this script (via kill -INT <pid>
), it will print a message and simply continue to run.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.