简体   繁体   中英

Python multiprocessing running faster locally than on cluster (slurm)

I have the following code

import multiprocessing as mp
import os

def funct(name):
    if nameisvalid:
        do_some_stuff_and_save_a_file
        return 1
    else:
        return 0

num_proc = 20 #or a call to slurm/mp for number of processors
pool = mp.Pool(processes=num_proc)
results = pool.map_async(makeminofname, [n for n in nameindex])
pool.close()
pool.join()

I have run this on my desktop with a 6-core processor with num_proc=mp.cpu_count() and it works fine and fast, but when I try to run this script in an sbatch script on our processing cluster, with -N 1 -n 20 (our nodes each have 24 processors), or any number of processors, it runs incredibly slow and only appears to utilize between 10-15 processors. Is there some way to optimize multiprocessing for working with slurm?

funct checked the disk for a specific file, then loaded a file, then did work, then saved a file. This caused my individual processes to be waiting for input/output operations instead of working. So I loaded all of the initial data before passing it to the pool, and added a Process from multiprocessing dedicated to saving files from a Queue that the pooled processes put their output into, so there is only ever one process trying to save.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM