简体   繁体   中英

How to run parallel programs in python

I have a python script to run a few external commands using the os.subprocess module. But one of these steps takes a huge time and so I would like to run it separately. I need to launch them, check they are finished and then execute the next command which is not parallel. My code is something like this:

nproc = 24 
for i in xrange(nproc):
    #Run program in parallel

#Combine files generated by the parallel step
for i in xrange(nproc):
    handle = open('Niben_%s_structures' % (zfile_name), 'w')
    for i in xrange(nproc):
        for zline in open('Niben_%s_file%d_structures' % (zfile_name,i)):handle.write(zline)
    handle.close()

#Run next step
cmd = 'bowtie-build -f Niben_%s_precursors.fa bowtie-index/Niben_%s_precursors' % (zfile_name,zfile_name)

For your example, you just want to shell out in parallel - you don't need threads for that.

Use the Popen constructor in the subprocess module: http://docs.python.org/library/subprocess.htm

Collect the Popen instances for each process you spawned and then wait() for them to finish:

procs = []
for i in xrange(nproc):
    procs.append(subprocess.Popen(ARGS_GO_HERE)) #Run program in parallel
for p in procs:
    p.wait()

You can get away with this (as opposed to using the multiprocessing or threading modules), since you aren't really interested in having these interoperate - you just want the os to run them in parallel and be sure they are all finished when you go to combine the results...

Running things in parallel can also be implemented using multiple processes in Python. I had written a blog post on this topic a while ago, you can find it here

http://multicodecjukebox.blogspot.de/2010/11/parallelizing-multiprocessing-commands.html

Basically, the idea is to use "worker processes" which independently retrieve jobs from a queue and then complete these jobs.

Works quite well in my experience.

You can do it using threads. This is very short and (not tested) example with very ugly if-else on what you are actually doing in the thread, but you can write you own worker classes..

import threading

class Worker(threading.Thread):
    def __init__(self, i):
        self._i = i
        super(threading.Thread,self).__init__()

    def run(self):
        if self._i == 1:
            self.result = do_this()
        elif self._i == 2:
            self.result = do_that()

threads = []
nproc = 24 
for i in xrange(nproc):
    #Run program in parallel        
    w = Worker(i)
    threads.append(w)
    w.start()
    w.join()

# ...now all threads are done

#Combine files generated by the parallel step
for i in xrange(nproc):
    handle = open('Niben_%s_structures' % (zfile_name), 'w')
    ...etc...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM