简体   繁体   中英

Python Multiprocessing does not wait

I am currently using multiprocessing functions to analyze roughly 10 files.

However, I only want to run 5 processes at each time.

When I try to implement this, it doesn't work. More processes are created then the number I specified. Is there a way that easily limits the number of processes to 5? (Windows 7 / Python 2.7)

EDIT: I'm afraid your solutions still don't work. I will try to post some more details here;

Main python file;

import python1
import python2 
import multiprocessing

# parallel = [fname1, fname2, fname3, fname4, fname5, fname6, fname7, fname8, fname9, fname10]
if name == '__main__':
   pool = multiprocessing.Pool(processes=max(len(parallel), 5))
   print pool.map(python1.worker, parallel)   

Python1 file;

import os
import time
import subprocess

def worker(sample):
    command = 'perl '+sample[1].split('data_')[0]+'methods_FastQC\\fastqc '+sample[1]+'\\'+sample[0]+'\\'+sample[0]+' --outdir='+sample[1]+'\\_IlluminaResults\\_fastqcAnalysis'
    subprocess.call(command)
    return sample

The return statement of 12 files come back befóre all the opened perl modules have closed. Also 12 perl shells are opened instead of only the max of 5. (Image; You can clearly see that the return statements come back before the perl commands even finish, and there are more than 5 processes http://oi57.tinypic.com/126a8ht.jpg )

I don't know why it is a secret what exactly doesn't happen and what happens instead.

And providing a SSCCE means a program that actually runs. (Have a look at the worker() function, for example. It gets a file parameter which is never used, and uses a command variable which is nowhere defined.)

But I think it is the point that your fileX are just file names and they are tried to be executed.

Change your function to

def worker(filename):
    command = "echo X " + filename + " Y"
    os.system(command)

and it should work fine. (Note that I changed file to filename in order not to hide a built-in name.)

BTW, instead of os.system() you should use the subprocess module.

In this case, you can do

import subprocess

def worker(filename):
    command = ["echo", "X", filename, "Y"]
    subprocess.call(command)

which should do the same.

Just as a stylistic remark:

pool = multiprocessing.Pool(processes=max(len(parallel), 5))

is simpler and does the same.


Your edit makes the problem much clearer now.

It seems that due to unknown reasons your perl programs exit earlier than they are really finished. I don't know why that happens - maybe they fork another process by themselves and exit immediately. Or it is due to windows and its weirdnesses.

As soon as the multiprocessing pool notices that a subprocess claims to be finished, it is ready to start another one.

So the right way would be to find out why the perl programs don't work as expected.

I tried with the following code under Linux with python-2.7 and it doesn't assert. Only 5 processes are created at a time.

import os
import multiprocessing
import psutil
from functools import partial

def worker(pid, filename):
#    assert len(psutil.Process(pid).children(recursive=True)) == 5  # for psutil-2.x
    assert len(psutil.Process(pid).get_children(recursive=True)) == 5
    print(filename)

parallel = range(0, 15)

if __name__ == '__main__':
#    with multiprocessing.Pool(processes=5) as pool:  # if you use python-3
    pool = multiprocessing.Pool(processes=min(len(parallel), 5))
    pool.map(partial(worker, os.getpid()), parallel)

Of course, if you use os.system() inside the worker function, it will create extra processes and the process tree will look like (using os.system('sleep 1') here)

\_ python2.7 ./test02.py
    \_ python2.7 ./test02.py
    |   \_ sh -c sleep 1
    |       \_ sleep 1
    \_ python2.7 ./test02.py
    |   \_ sh -c sleep 1
    |       \_ sleep 1
    \_ python2.7 ./test02.py
    |   \_ sh -c sleep 1
    |       \_ sleep 1
    \_ python2.7 ./test02.py
    |   \_ sh -c sleep 1
    |       \_ sleep 1
    \_ python2.7 ./test02.py
        \_ sh -c sleep 1
            \_ sleep 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM