How to merge python IO streams into a single iterator, but maintain which item comes from which stream?

Question

The desired functionality is something along the following:

import subprocess as sp

p = sp.Popen(('program', 'arg1', ...), stdout=sp.PIPE, stderr=sp.PIPE)

for line in merge_streams((p.stdout, 'OUT'), (p.stderr, 'ERR')):
    print(line)

Which should output something like this, in real-ish time :

('OUT', b'output line 1')
('OUT', b'output line 2')
('ERR', b'err line 1')
('ERR', b'err line 2')
('OUT', b'output line 3')

Just to be clear, running the same program from a CMD will output:

output line 1
output line 2
err line 1
err line 2
output line 3

Using p = sp.Popen(('program', 'arg1', ...), stdout=sp.PIPE, stderr=sp.STDOUT) will merge the streams but there is no way to distinguish between them.

Using itertools.chain will obviously put all lines from one after all lines from the other.

My only semi-working solution involved 2 Threads pushing to a collections.dequeue and the main program reading from it, but this approach seems to mess up the order of multiline blocks written to the stream at the same time.

ie, an exception like:

asdf : The term 'asdf' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At something.ps1:2 char:5
+     asdf
+     ~~~~
    + CategoryInfo          : ObjectNotFound: (asdf:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Might be printed like:

b'    + CategoryInfo          : ObjectNotFound: (asdf:String) [], CommandNotFoundException'
b'    + FullyQualifiedErrorId : CommandNotFoundException'
b'At someting.ps1:2 char:5'
b'+     asdf'
b'+     ~~~~'
b"asdf : The term 'asdf' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the 
path is correct and try again."

To avoid the XY problem: The end goal is to send the output over to a client in real time using fastapi.responses.StreamingResponse , and that client needs to know what is stderr and what is stdout. If using a WebSocket makes this easier that is also ok.

Answer 1

Not sure what went wrong with your threading solution, but this seems to be working well:

import queue
import threading
import subprocess as sp
from typing import IO, Tuple, Any

def enqueue_io(f: IO, q: queue.Queue, prefix: Any)
    for line in iter(f.readline, b''):
        q.put((prefix, line))

def merge_streams(*streams: Tuple[Any, IO])
    q = queue.Queue()
    threads = [threading.Thread(target=enqueue_io, args=(f, q, prefix) for prefix, f in streams]
    [t.start() for t in threads]

    while any(map(threading.Thread.is_alive, threads)):
       try:
            yield q.get_nowait()
       except queue.Empty:
            pass
    
    while q.qsize() > 0:
        yield q.get_nowait()

    [t.join() for t in threads]

with sp.Popen(...) as p:
    for line in merge_streams(('OUT', p.stdout), ('ERR', p.stderr)):
        print(line)

How to merge python IO streams into a single iterator, but maintain which item comes from which stream?

Question

1 answers

solution1
0 2021-12-29 08:19:53

How to merge python IO streams into a single iterator, but maintain which item comes from which stream?

Question

1 answers

solution1 0 2021-12-29 08:19:53

solution1
0 2021-12-29 08:19:53