简体   繁体   中英

Python: How to multiprocess my function with multiple arguments?

My function sound_proc.windowing() cuts some soundfiles in pieces from a directory and saves the chunks to another directory. To cut all files in the directory I iterate with a for loop over all files:

# emodb_path_src = source folder with all sound files in it
# 512 = fix integer
# emodb_path_train = destination folder where all the cut files go

files = [l for l in listdir(emodb_path_src)]

for index, file in enumerate(files):
    print(f'File: {index+1}/{len(files)}')
    sound_proc.windowing(f'{emodb_path_src}{file}', 512, emodb_path_train)

Unfortunately, this process is very slow since only one processor core is used. I have already tried it with multiprocessing and Pool but I can't make it work. Would be nice if someone could give me some hints to get it running on multiple cores.

Thank you in advance and have a nice day!

Yes, you can use the multiprocessing Pool for this in combination with starmap. The trick is to create an iterable of iterables, containing all the args for all the function calls like so:

import multiprocessing as mp

# Precreate the args
args = [(f'{emodb_path_src}{file}', 512, emodb_path_train) for file in listdir(emodb_path_src)]

with mp.Pool(mp.cpu_count()) as pool:
    print(pool.starmap(sound_proc.windowing, args))

Depending on how I/O bound your problem is, you might also want to try ThreadPool . It's simpler to use and in some occassions even faster than Pool because it has less overhead to set up. Another added advantage is that they share the same GIL so you can access the same variable space. Here's a snippet, notice the difference is only a new import and the with statement:

import multiprocessing as mp
from multiprocessing.pool import ThreadPool # thread-based Pool

# Precreate the args
args = [(f'{emodb_path_src}{file}', 512, emodb_path_train) for file in listdir(emodb_path_src)]

# If it's very I/O bound it might even be worthwhile creating more threads than CPU count!
with ThreadPool(mp.cpu_count()) as pool:
    print(pool.starmap(sound_proc.windowing, args)) 

In the meantime I found a good solution for the problem. The code von Gerard works also like charm. I would like to show you briefly how I have solved it now and what time savings you get with the different methods.

Plain for-loop from the original post:

Computation duration: ~ 457.7 seconds

files = [l for l in listdir(emodb_path_src)]

for index, file in enumerate(files):
    print(f'File: {index+1}/{len(files)}')
    sound_proc.windowing(f'{emodb_path_src}{file}', 512, emodb_path_train)

Solution provided by Gerard:

Computation duration: ~ 75.2 seconds

import multiprocessing as mp

args = [(f'{emodb_path_src}{file}', 512, emodb_path_train) for file in listdir(emodb_path_src)]
    
with mp.Pool(mp.cpu_count()) as pool:
    pool.starmap(sound_proc.windowing, args)

Using joblib.Parallel:

Computation duration: ~ 68.8 seconds

from joblib import Parallel, delayed
import multiprocessing as mp

Parallel(n_jobs=mp.cpu_count(), backend='multiprocessing')(delayed(sound_proc.windowing)
                                                    (
                                                        sound_file=f'{emodb_path_src}{file}',
                                                        window_size=512,
                                                        dest=emodb_path_train
                                                    ) for file in files)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM