简体   繁体   中英

How to parallelize the “**for-loop**” that run many executbles in order in python?

I have a python script that reads many executables written and compiled in C program. There is no issue with these executables. However, When I have to run these executable in for loop, i tried to parallize the loop.

Note: prog1,prog2,prog3 must run in order.
This is a sample example, but in my real code
prog2 depends on output of prog1, and prog3 
depends on output of prog2 and so on.
I have seven executables in for loop of iteration 20,
it takes more than 2 hour to complete the process.
If i could parallize the code, it would save a lot of time.
Help would be greatly appreciated!!!!

In my code example 1 runs fine but example 2 doesnot run. The full code is presented below:

#!/usr/bin/python

from multiprocessing import Pool
import os, sys, subprocess, math, re, shutil,copy

#function to run a program and write output to the shell
################################################################################
def run_process(name, args,):
print "--------------------------------------------------------------------"
print "Running: %s"%name
print "Command:"
for arg in args:
    print arg,
print ""
print "--------------------------------------------------------------------"
process = subprocess.Popen(args)

process.communicate()
if process.returncode != 0:
    print "Error: %s did not terminate correctly. Return code: %i."%(name, process.returncode)
    sys.exit(1)  # this will exit the code in case of error
###########################       
# example 1
#run_process("prog1.c", ['./prog1'])
#run_process("prog2.c", ['./prog2'])        
#run_process("prog3.c", ['./prog3', 'first argument'])


# example 2 (parallizing)
commands = []
for x in range(0,20):
    commands.extend(("prog1.c",['./prog1']))
    commands.extend(("prog2.c",['./prog2']))
    commands.extend(("prog3.c",['./prog3', 'first argument']))


p = Pool()
p.map(run_process, commands)

Here, if i run example 1 it runs flawlessly. But when i try to run example 2, it gives following error:

    TypeError: run_process() takes exactly 2 arguments (1 given)

Further note:
To create the executables prog1,prog2,and prog3 I wrote C codes.
Which looks like this:

// to compile: gcc -o prog1 prog1.c
// to run : ./prog1
#include <stdio.h>
int main() {
printf("This is program 1\n");
return 0; }

prog2 looks exactly same. And prog3 looks like this:

//to compile: gcc -o prog3 prog3.c 
//to run: ./prog3 'argument1'
#include <stdio.h>
int main(int argc, char ** argv) {
printf("This is program 3\n");
printf("The argument is = %s\n", argv[1]);  
return 0; }

Now, there are 21 iterations inside the for loop.
In the first iteration it suppose it runs executables prog1,prog2....,prog7
and finally produce ouptput1.fits.
In the second interation it again run seven executables in order and produces output2.fits.
And finally it creates 21 fits files. What I can do is make four functions:
func1 for loop 0 to 5
fucn2 for loop 5 to 10
func3 for loop 11 to 15
func4 for loop 16 to 21
Then I want to run these four functions in parallel process.
My Question is : How can I run example 2 without any error?

Python has a Pool of processes built exactly for this purpose.

Given the fact you need to run X times the same sequence of commands and supposing the sequence of commands can be run in parallel. This means the Nth run can be run together with the Nth+1 without any implication.

from multiprocessing import Pool

commands = tuple(("prog1.c",['./prog1']), ...)

def run_processes(execution_index):
    print("Running sequence for the %d time." % execution_index)

    for command in commands:
        process = subprocess.Popen(command)
        ...

p = Pool()
p.map(run_processes, range(20))

On Pyhton3 you can use the ProcessExecutor .

Whenever you want to run something concurrently you need to understand the execution boundaries first. If two lines of execution are interdependent, you either set up a communication between the two (using for example a pipe) or avoid running them concurrently.

In your case, the commands are interdependent so it becomes problematic to run them concurrently. But if the whole sequence is not interdependent then you can run those in parallel.

import multiprocessing
for x in range(0,20): 
    multiprocessing.Process(target=run_process, args=("colour.c",['./cl',"color.txt",str(x) ])  
    ...

not really sure what else I could add ...

Have a look at what group functions of Celery's canvas do. They allow you to call functions at the same time, with different set of arguments. Say you want to process a total of 1000 elements in your for loop. Doing the same sequentially is highly unoptimized. A simple solution will be to call the same function with two sets of arguments. Even this simple hack will bring down your processing time down by half. That is what Canvas and Celery are about.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM