简体   繁体   中英

How do I specify the number of CPU cores to use within a python script running via bash script?

I'm wanting to align some fasta files to a genome, and to do this quickly I have a python script that loops through a directory to only pick certain files. I've used subprocess, which I've included everything I would otherwise put in a basic bash script if I wasn't running it in a loop. The issue is I'm wanting to run it in parallel, however using $NSLOTS within subprocess doesn't work.

My bash script is this:

#!/bin/bash --login
#$ -cwd
#$ -pe smp.pe 8 #specifies to use 8 cores

module load apps/binapps/anaconda3/2019.03  #Python 3.7.3
module load apps/bioinf
module load apps/star/2.5.3a/gcc-4.8.5

python STAR_aligner.py

My python script is this:

import subprocess
import os

directory = '/mnt/fls01-home01/mfbx3sgh/scratch/Trimming'
genome_idx = '/mnt/fls01-home01/mfbx3sgh/scratch/Genome_Index'
genome_dir = '/mnt/fls01-home01/mfbx3sgh/scratch/Genome/'

file_list = os.listdir(directory)

for read_1 in file_list:

    if 'paired' in read_1:

        if '_1_' in read_1:

            read_2 = read_1.replace('_1_', '_2_')

            subprocess.run(['STAR', '--runThreadN', '$NSLOTS', '--genomeDir', genome_idx, '--readFilesIn', read_1,
                            read_2, '--readFilesCommand', 'gunzip', '-c', '--outFileNamePrefix', genome_dir])

As you can see I specify to use 8 cores in the bash script, and then later in the subprocess bit I call $NSLOTS, which when not within a python script works with the specified cores.

The error I'm getting is

EXITING: fatal input ERROR: runThreadN must be >0, user-defined runThreadN=0

Sep 30 11:52:24 ...... FATAL ERROR, exiting

The Python script doesn't seem to be adding any value at all, you should just write it all as a Bash script.

#!/bin/bash
root="/mnt/fls01-home01/mfbx3sgh/scratch"
for read_1 in "$root/Trimming/"*; do
    case $read_1 in
     *paired*_1_*|*_1_*paired*)
        STAR --runThreadN "$NSLOTS" --genomeDir "$root/Genome_Index/" \
          --readFilesIn "$read_1" "${read_1//_1_/_2_}" \
          --readFilesCommand gunzip -c --outFileNamePrefix "$root/Genome/" ;;
    esac
done

If your question is how to properly use $NSLOTS from Python, that would be

os.environ['NSLOTS']

(The string '$NSLOTS' is simply the static string $ , N , S etc.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM