简体   繁体   中英

Running multiple instances of a python file with two input files using GNU Parallel on an HPC system with SLURM

I try to run a single python file 240 times in parallel (since each individual file run takes about 9 min) on an HPC-system. Ideally each python file should run on a single core. There are 24 cores per node. The python file takes two input files, one from each set:

  • CN_ONLY0.pdb up to CN_ONLY239.pdb
  • I_ONLY0.pdb up to I_ONLY239.pdb .

When I run the below posted code:

parallel="parallel --delay .2         \
                    -j $SLURM_NTASKS   \
                   --joblog runtask.log \
                   --resume              \
                   --max-args=2"

srun="srun --exclusive -N1 -n1 --cpus-per-task=1 --cpu-bind=cores"

find . -type f \( -name "CN_ONLY*.pdb" -o -name "I_ONLY*.pdb" \) |
        sort -t Y -k 2 -g     |
        TMPDIR=$SLURM_SCRATCH \
        $parallel python python_test.py

It runs the Python program correctly, but does not distribute the program to all the different requested CPUs.

Does anyone know how to fix this problem?

There is no need to use GNU parallel when SLURM itself proivdes that functionality via array jobs (or job arrays). Simply add --array=1-240 to the srun command and then submit the following script:

#!/bin/sh

id=$(expr ${SLURM_ARRAY_TASK_ID} - 1)
python python_test.py CN_ONLY${id}.pdb I_ONLY${id}.pdb

What happens is that SLURM will launch this script 240 times and set the value of SLURM_ARRAY_TASK_ID to a different value in each of them, ranging from 1 to 240. It is then trivial to subtract one from this value and use it to generate the names of the script arguments.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM