python subprocess module: looping over stdout of child process

Question

I have some commands which I am running using the subprocess module. I then want to loop over the lines of the output. The documentation says do not do data_stream.stdout.read which I am not but I may be doing something which calls that. I am looping over the output like this:

for line in data_stream.stdout:
   #do stuff here
   .
   .
   .

Can this cause deadlocks like reading from data_stream.stdout or are the Popen modules set up for this kind of looping such that it uses the communicate code but handles all the callings of it for you?

Answer 1

You have to worry about deadlocks if you're communicating with your subprocess, ie if you're writing to stdin as well as reading from stdout. Because these pipes may be cached, doing this kind of two-way communication is very much a no-no:

data_stream = Popen(mycmd, stdin=PIPE, stdout=PIPE)
data_stream.stdin.write("do something\n")
for line in data_stream:
  ...  # BAD!

However, if you've not set up stdin (or stderr) when constructing data_stream, you should be fine.

data_stream = Popen(mycmd, stdout=PIPE)
for line in data_stream.stdout:
   ...  # Fine

If you need two-way communication, use communicate .

Answer 2

The two answer have caught the gist of the issue pretty well: don't mix writing something to the subprocess, reading something from it, writing again, etc -- the pipe's buffering means you're at risk of a deadlock. If you can, write everything you need to write to the subprocess FIRST, close that pipe, and only THEN read everything the subprocess has to say; communicate is nice for the purpose, IF the amount of data is not too large to fit in memory (if it is, you can still achieve the same effect "manually").

If you need finer-grain interaction, look instead at pexpect or, if you're on Windows, wexpect .

Answer 3

SilentGhost's/chrispy's answers are OK if you have a small to moderate amount of output from your subprocess. Sometimes, though, there may be a lot of output - too much to comfortably buffer in memory. In such a case, the thing to do is start() the process, and spawn a couple of threads - one to read child.stdout and one to read child.stderr where child is the subprocess. You then need to wait() for the subprocess to terminate.

This is actually how communicate() works; the advantage of using your own threads is that you can process the output from the subprocess as it is generated. For example, in my project python-gnupg I use this technique to read status output from the GnuPG executable as it is generated, rather than waiting for all of it by calling communicate() . You are welcome to inspect the source of this project - the relevant stuff is in the module gnupg.py .

Answer 4

data_stream.stdout is a standard output handle . you shouldn't be looping over it. communicate returns tuple of (stdoutdata, stderr) . this stdoutdata you should be using to do your stuff.

python subprocess module: looping over stdout of child process

Question

4 answers

solution1
6 ACCPTED 2009-08-14 13:57:35

solution2
6 2009-08-14 14:38:51

solution3
3 2009-08-14 14:33:29

solution4
0 2009-08-14 13:54:15

python subprocess module: looping over stdout of child process

Question

4 answers

solution1 6 ACCPTED 2009-08-14 13:57:35

solution2 6 2009-08-14 14:38:51

solution3 3 2009-08-14 14:33:29

solution4 0 2009-08-14 13:54:15

solution1
6 ACCPTED 2009-08-14 13:57:35

solution2
6 2009-08-14 14:38:51

solution3
3 2009-08-14 14:33:29

solution4
0 2009-08-14 13:54:15