简体   繁体   中英

Python multiprocessing pool number of jobs not correct

I wrote a python program to launch parallel processes (16) using pool, to process some files. At the beginning of the run, the number of processes is maintained at 16 until almost all files get processed. Then, for some reasons which I don't understand, when there're only a few files left, only one process runs at a time which makes processing time much longer than necessary. Could you help with this?

Force map() to use a chunksize of 1 instead of guessing the best value by itself, es.:

pool = Pool(16)
pool.map(func, iterable, 1)

This should (in theory) guarantee the best distribution of load among workers until the end of the input data.

See here

Python, before starts the execution of the process that you specify in applyasync/asyncmap of Pool, assigns to each worker a piece of the work.

For example, lets say that you have 8 files to process and you start a Pool with 4 workers.

Before starting the file processing, two specific files will be assigned to each worker. This means that if some worker ends its job earlier than the others, will simply "have a break" and will not start helping the others.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM