简体   繁体   English

如何在python中运行并行程序

[英]How to run parallel programs in python

I have a python script to run a few external commands using the os.subprocess module. 我有一个python脚本使用os.subprocess模块​​运行一些外部命令。 But one of these steps takes a huge time and so I would like to run it separately. 但其中一个步骤需要很长时间,所以我想分开运行它。 I need to launch them, check they are finished and then execute the next command which is not parallel. 我需要启动它们,检查它们是否已完成,然后执行下一个不平行的命令。 My code is something like this: 我的代码是这样的:

nproc = 24 
for i in xrange(nproc):
    #Run program in parallel

#Combine files generated by the parallel step
for i in xrange(nproc):
    handle = open('Niben_%s_structures' % (zfile_name), 'w')
    for i in xrange(nproc):
        for zline in open('Niben_%s_file%d_structures' % (zfile_name,i)):handle.write(zline)
    handle.close()

#Run next step
cmd = 'bowtie-build -f Niben_%s_precursors.fa bowtie-index/Niben_%s_precursors' % (zfile_name,zfile_name)

For your example, you just want to shell out in parallel - you don't need threads for that. 对于您的示例,您只想并行shell - 您不需要线程。

Use the Popen constructor in the subprocess module: http://docs.python.org/library/subprocess.htm subprocess模块中使用Popen构造函数: http//docs.python.org/library/subprocess.htm

Collect the Popen instances for each process you spawned and then wait() for them to finish: 为您生成的每个进程收集Popen实例,然后wait()让它们完成:

procs = []
for i in xrange(nproc):
    procs.append(subprocess.Popen(ARGS_GO_HERE)) #Run program in parallel
for p in procs:
    p.wait()

You can get away with this (as opposed to using the multiprocessing or threading modules), since you aren't really interested in having these interoperate - you just want the os to run them in parallel and be sure they are all finished when you go to combine the results... 你可以逃避这一点(而不是使用multiprocessingthreading模块),因为你并不真正对这些互操作感兴趣 - 你只是希望操作系统并行运行它们并确保它们在你去的时候都完成了结合结果......

Running things in parallel can also be implemented using multiple processes in Python. 并行运行也可以使用Python中的多个进程来实现。 I had written a blog post on this topic a while ago, you can find it here 我刚刚写了一篇关于这个主题的博客文章,你可以在这里找到它

http://multicodecjukebox.blogspot.de/2010/11/parallelizing-multiprocessing-commands.html http://multicodecjukebox.blogspot.de/2010/11/parallelizing-multiprocessing-commands.html

Basically, the idea is to use "worker processes" which independently retrieve jobs from a queue and then complete these jobs. 基本上,我们的想法是使用“工作进程”,它独立地从队列中检索作业,然后完成这些作业。

Works quite well in my experience. 在我的经验中运作得很好。

You can do it using threads. 你可以使用线程来做到这一点。 This is very short and (not tested) example with very ugly if-else on what you are actually doing in the thread, but you can write you own worker classes.. 这是非常简短的(未经测试)示例,非常丑陋的if-else就是你在线程中实际做的事情,但你可以编写自己的工人类..

import threading

class Worker(threading.Thread):
    def __init__(self, i):
        self._i = i
        super(threading.Thread,self).__init__()

    def run(self):
        if self._i == 1:
            self.result = do_this()
        elif self._i == 2:
            self.result = do_that()

threads = []
nproc = 24 
for i in xrange(nproc):
    #Run program in parallel        
    w = Worker(i)
    threads.append(w)
    w.start()
    w.join()

# ...now all threads are done

#Combine files generated by the parallel step
for i in xrange(nproc):
    handle = open('Niben_%s_structures' % (zfile_name), 'w')
    ...etc...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM