简体   繁体   中英

Memory usage when reading lines from a piped subprocess stdout in python

I just want to understand what happens in the "background" in terms of memory usage when dealing with a subprocess.Popen() result and reading line by line. Here's a simple example.

Given the following script test.py that prints "Hello" then waits 10s and prints "world":

import sys
import time
print ("Hello")
sys.stdout.flush()
time.sleep(10)
print ("World")

Then the following script test_sub.py will call as a subprocess 'test.py', redirect the stdout to a pipe and then read it line by line:

import subprocess, time, os, sy

cmd = ["python3","test.py"]

p = subprocess.Popen(cmd,
                     stdout=subprocess.PIPE,
                     stderr=subprocess.STDOUT, universal_newlines = True)

for line in iter(p.stdout.readline, ''):
   print("---" + line.rstrip())

In this case my question would be, when I run test_sub.py after it does the subprocess call, it will print "Hello" then wait 10s until "world" comes and then print it, what happens to "Hello" during those 10s of waiting? Does it get stored in memory until test_sub.py finishes, or does it get tossed away in the first iteration?

This may not matter to much for this example, but when dealing with really big files it does.

what happens to "Hello" during those 10s of waiting?

The "Hello" (in the parent) is available via line name until .readline() returns the second time ie, "Hello" lives at the very least until the output of print("World") is read in the parent.

If you mean what happens in the child process then after sys.stdout.flush() there is no reason for "Hello" object to continue to live but it may eg, see Does Python intern strings?

Does it get stored in memory until test_sub.py finishes, or does it get tossed away in the first iteration?

After .readline() returns the second time, line refers to "World" . What happens with "Hello" after that depends on the garbage collection in the specific Python implementation ie, even if line is "World" ; the object "Hello" may continue to live for some time. Releasing memory in Python .

You could set PYTHONDUMPREFS=1 envvar and run your code using a debug python build, to see object that are alive when the python process exits. For example, consider this code:

#!/usr/bin/env python3
import threading
import time
import sys

def strings():
    yield "hello"
    time.sleep(.5)
    yield "world"
    time.sleep(.5)

def print_line():
    while True:
        time.sleep(.1)
        print('+++', line, file=sys.stderr)

threading.Thread(target=print_line, daemon=True).start()
for line in strings():
    print('---', line)
time.sleep(1)

It demonstrates that line is not rebound until the second yield . The output of PYTHONDUMPREFS=1 ./python . |& grep "'hello'" PYTHONDUMPREFS=1 ./python . |& grep "'hello'" shows that 'hello' is still alive when python exits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM