简体   繁体   English

Python多处理不等待

[英]Python Multiprocessing does not wait

I am currently using multiprocessing functions to analyze roughly 10 files. 我目前正在使用多处理功能来分析大约10个文件。

However, I only want to run 5 processes at each time. 但是,我每次只想运行5个进程。

When I try to implement this, it doesn't work. 当我尝试实现此功能时,它不起作用。 More processes are created then the number I specified. 比我指定的编号创建更多的进程。 Is there a way that easily limits the number of processes to 5? 有没有一种方法可以轻松地将进程数限制为5? (Windows 7 / Python 2.7) (Windows 7 / Python 2.7)

EDIT: I'm afraid your solutions still don't work. 编辑:恐怕您的解决方案仍然无法正常工作。 I will try to post some more details here; 我将尝试在此处发布更多详细信息;

Main python file; 主python文件;

import python1
import python2 
import multiprocessing

# parallel = [fname1, fname2, fname3, fname4, fname5, fname6, fname7, fname8, fname9, fname10]
if name == '__main__':
   pool = multiprocessing.Pool(processes=max(len(parallel), 5))
   print pool.map(python1.worker, parallel)   

Python1 file; Python1文件;

import os
import time
import subprocess

def worker(sample):
    command = 'perl '+sample[1].split('data_')[0]+'methods_FastQC\\fastqc '+sample[1]+'\\'+sample[0]+'\\'+sample[0]+' --outdir='+sample[1]+'\\_IlluminaResults\\_fastqcAnalysis'
    subprocess.call(command)
    return sample

The return statement of 12 files come back befóre all the opened perl modules have closed. 在所有打开的perl模块都已关闭之前,将返回12个文件的return语句。 Also 12 perl shells are opened instead of only the max of 5. (Image; You can clearly see that the return statements come back before the perl commands even finish, and there are more than 5 processes http://oi57.tinypic.com/126a8ht.jpg ) 还打开了12个perl shell,而不是最多打开了5个。(图像;您可以清楚地看到return语句在perl命令甚至完成之前就返回了,并且有5个以上的进程http://oi57.tinypic.com /126a8ht.jpg

I don't know why it is a secret what exactly doesn't happen and what happens instead. 我不知道为什么什么秘密和什么反而是秘密。

And providing a SSCCE means a program that actually runs. 提供SSCCE意味着程序可以实际运行。 (Have a look at the worker() function, for example. It gets a file parameter which is never used, and uses a command variable which is nowhere defined.) (例如,查看worker()函数。它获取一个从未使用过的file参数,并使用一个未定义的command变量。)

But I think it is the point that your fileX are just file names and they are tried to be executed. 但是我认为这是您的fileX只是文件名,并且试图执行它们的要点。

Change your function to 将功能更改为

def worker(filename):
    command = "echo X " + filename + " Y"
    os.system(command)

and it should work fine. 它应该可以正常工作。 (Note that I changed file to filename in order not to hide a built-in name.) (请注意,为了不隐藏内置名称,我将file更改为filename名。)

BTW, instead of os.system() you should use the subprocess module. BTW,而不是os.system()你应该使用的subprocess模块。

In this case, you can do 在这种情况下,您可以

import subprocess

def worker(filename):
    command = ["echo", "X", filename, "Y"]
    subprocess.call(command)

which should do the same. 应该做的一样。

Just as a stylistic remark: 就像风格上的话:

pool = multiprocessing.Pool(processes=max(len(parallel), 5))

is simpler and does the same. 更简单,而且也一样。


Your edit makes the problem much clearer now. 您的编辑现在使问题更加清楚了。

It seems that due to unknown reasons your perl programs exit earlier than they are really finished. 似乎由于未知原因,您的Perl程序比真正完成的退出要早。 I don't know why that happens - maybe they fork another process by themselves and exit immediately. 我不知道为什么会发生这种情况-也许他们自己分叉另一个进程并立即退出。 Or it is due to windows and its weirdnesses. 或者是由于窗户及其怪异。

As soon as the multiprocessing pool notices that a subprocess claims to be finished, it is ready to start another one. 一旦多处理池注意到一个子进程声称已完成,就可以开始另一个进程了。

So the right way would be to find out why the perl programs don't work as expected. 因此正确的方法是找出为什么perl程序无法按预期工作的原因。

I tried with the following code under Linux with python-2.7 and it doesn't assert. 我在Linux下使用python-2.7尝试了以下代码,但没有断言。 Only 5 processes are created at a time. 一次仅创建5个进程。

import os
import multiprocessing
import psutil
from functools import partial

def worker(pid, filename):
#    assert len(psutil.Process(pid).children(recursive=True)) == 5  # for psutil-2.x
    assert len(psutil.Process(pid).get_children(recursive=True)) == 5
    print(filename)

parallel = range(0, 15)

if __name__ == '__main__':
#    with multiprocessing.Pool(processes=5) as pool:  # if you use python-3
    pool = multiprocessing.Pool(processes=min(len(parallel), 5))
    pool.map(partial(worker, os.getpid()), parallel)

Of course, if you use os.system() inside the worker function, it will create extra processes and the process tree will look like (using os.system('sleep 1') here) 当然,如果在worker函数中使用os.system(),它将创建额外的进程,并且进程树看起来像(在此处使用os.system('sleep 1'))

\_ python2.7 ./test02.py
    \_ python2.7 ./test02.py
    |   \_ sh -c sleep 1
    |       \_ sleep 1
    \_ python2.7 ./test02.py
    |   \_ sh -c sleep 1
    |       \_ sleep 1
    \_ python2.7 ./test02.py
    |   \_ sh -c sleep 1
    |       \_ sleep 1
    \_ python2.7 ./test02.py
    |   \_ sh -c sleep 1
    |       \_ sleep 1
    \_ python2.7 ./test02.py
        \_ sh -c sleep 1
            \_ sleep 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM