Append output of multiple subprocesses to array in Python

Question

I'm working on a script in Python which open multiple subprocesses in this way:

for file in os.listdir(FOLDER):
    subprocess.Popen(([myprocess]))

Now this processes could be 10-20 running in parallel, and each of them will output in the console a single string line. What I want to do is to append these outputs (no matter in which order) to an array, and when all the processes are done, continue with the script doing other stuff.

I've no idea how to append each output to the array, I was thinking that to check if all subprocesses are done I could do something like this:

outputs = []
k = len(os.listdir(FOLDER))
if len(outputs) == k 
 print "All processes are done!"

UPDATE! This code seems to work now:

pids=set()
outputs = []
for file in os.listdir(FOLDER):
    p = subprocess.Popen(([args]), stdout=subprocess.PIPE)
    pids.add(p.pid)
while pids:
    pid,retval=os.wait()
    output = p.stdout.read()
    outputs.append(output)
    print('{p} finished'.format(p=pid))
    pids.remove(pid)

print "Done!"
print outputs

The problem is that outputs look like this

>> Done!
>> ['OUTPUT1', '', '', '', '', '', '', '', '', '']

Only the first value is filled, the others are left empty, why?

Answer 1

What I want to do is to append these outputs (no matter in which order) to an array, and when all the processes are done, continue with the script doing other stuff.

#!/usr/bin/env python
import os
from subprocess import Popen, PIPE

# start processes (run in parallel)
processes = [Popen(['command', os.path.join(FOLDER, filename)], stdout=PIPE)
             for filename in os.listdir(FOLDER)]
# collect output
lines = [p.communicate()[0] for p in processes]

To limit the number of concurrent processes, you could use a thread pool:

#!/usr/bin/env python
import os
from multiprocessing.dummy import Pool, Lock
from subprocess import Popen, PIPE

def run(filename, lock=Lock()):
    with lock: # avoid various multithreading bugs related to subprocess
        p = Popen(['command', os.path.join(FOLDER, filename)], stdout=PIPE)
    return p.communicate()[0]

# no more than 20 concurrent calls
lines = Pool(20).map(run, os.listdir(FOLDER))

The latter code example can also read from several child processes concurrently while the former essentially serializes the execution after the corresponding stdout OS pipe buffers are full.

Answer 2

You could wait till all of them finish their job, and then aggregate their standard outputs. To see how it's done, see this answer which covers implementation in-depth.

If you need to do it asynchronously, you should spawn a new thread for this job, and do the waiting in that thread.

If you need to get notified about the results in real time, you could spawn a thread for each of the process separately, wait for them in each of these threads, then after they're done update your list.

To read the output from the process, you can use subprocess.PIPE like presented in this answer .

Edit here is a full sample that worked for me:

#!/usr/bin/python2
import os
import random
import subprocess
outputs = []
processes = []
for i in range(4):
    args = ['bash', '-c', 'sleep ' + str(random.randint(0, 3)) + '; whoami']
    p = subprocess.Popen(args, stdout=subprocess.PIPE)
    processes.append(p)
while processes:
    p = processes[0]
    p.wait()
    output = p.stdout.read()
    outputs.append(output)
    print('{p} finished'.format(p=p.pid))
    os.sys.stdout.flush()
    processes.remove(p)
print outputs

Append output of multiple subprocesses to array in Python

Question

2 answers

solution1
2 2015-06-08 18:57:29

solution2
0 ACCPTED 2015-06-06 16:52:41

Append output of multiple subprocesses to array in Python

Question

2 answers

solution1 2 2015-06-08 18:57:29

solution2 0 ACCPTED 2015-06-06 16:52:41

solution1
2 2015-06-08 18:57:29

solution2
0 ACCPTED 2015-06-06 16:52:41