Running multiple instances of a python file with two input files using GNU Parallel on an HPC system with SLURM

Question

I try to run a single python file 240 times in parallel (since each individual file run takes about 9 min) on an HPC-system. Ideally each python file should run on a single core. There are 24 cores per node. The python file takes two input files, one from each set:

CN_ONLY0.pdb up to CN_ONLY239.pdb
I_ONLY0.pdb up to I_ONLY239.pdb .

When I run the below posted code:

parallel="parallel --delay .2         \
                    -j $SLURM_NTASKS   \
                   --joblog runtask.log \
                   --resume              \
                   --max-args=2"

srun="srun --exclusive -N1 -n1 --cpus-per-task=1 --cpu-bind=cores"

find . -type f \( -name "CN_ONLY*.pdb" -o -name "I_ONLY*.pdb" \) |
        sort -t Y -k 2 -g     |
        TMPDIR=$SLURM_SCRATCH \
        $parallel python python_test.py

It runs the Python program correctly, but does not distribute the program to all the different requested CPUs.

Does anyone know how to fix this problem?

Answer 1

There is no need to use GNU parallel when SLURM itself proivdes that functionality via array jobs (or job arrays). Simply add --array=1-240 to the srun command and then submit the following script:

#!/bin/sh

id=$(expr ${SLURM_ARRAY_TASK_ID} - 1)
python python_test.py CN_ONLY${id}.pdb I_ONLY${id}.pdb

What happens is that SLURM will launch this script 240 times and set the value of SLURM_ARRAY_TASK_ID to a different value in each of them, ranging from 1 to 240. It is then trivial to subtract one from this value and use it to generate the names of the script arguments.

Running multiple instances of a python file with two input files using GNU Parallel on an HPC system with SLURM

Question

1 answers

solution1
0 2020-05-07 18:10:44

Running multiple instances of a python file with two input files using GNU Parallel on an HPC system with SLURM

Question

1 answers

solution1 0 2020-05-07 18:10:44

solution1
0 2020-05-07 18:10:44