简体   繁体   中英

Getting latest lines of streaming stdout from a Python subprocess

My goal: To read the latest "chunk" (N lines) of streaming stdout every M seconds from a subprocess.

Current code:

  1. start the subprocess
  2. reads stdout
  3. once I have a chunk of N lines, print it out (or save as current chunk)
  4. wait M seconds
  5. repeat
  6. I have also put code for the moment to terminate the subprocess (which is an endless stream until you hit Ctrl-C)

What I want to achieve is after I wait for M seconds, if for it to always read the latest N lines and not the subsequent N lines in stdout (they can be discarded as I'm only interested in the latest)

My end goal would be to spawn a thread to run the process and keep saving the latest lines and then call from the main process whenever I need the latest results of the stream.

Any help would be greatly appreciated!

#!/usr/bin/env python3
import signal
import time
from subprocess import Popen, PIPE

sig = signal.SIGTERM

N=9
M=5

countlines=0
p = Popen(["myprogram"], stdout=PIPE, bufsize=1, universal_newlines=True)

chunk=[]

for line in p.stdout:
    countlines+=1
    chunk.append(line)

    if len(chunk)==N:
        print(chunk)
        chunk=[]
        time.sleep(M)

    if countlines>100:
        p.send_signal(sig)
        break

print("done")

After much searching, I stumbled upon a solution here:

https://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/

Eli's "Launch, interact, get output in real time, terminate" code section worked for me. So far its the most elegant solution I've found.

Adapted to my problem above, and written within a class (not shown here):

def output_reader(self,proc):
    chunk=[]
    countlines=0
    for line in iter(proc.stdout.readline, b''):
        countlines+=1
        chunk.append(line.decode("utf-8"))
        if countlines==N:
            self.current_chunk = chunk
            chunk=[]
            countlines=0

def main():
    proc = subprocess.Popen(['myprocess'],
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT)

    t = threading.Thread(target=output_reader, args=(proc,))
    t.start()

    try:
        time.sleep(0.2)
        for i in range(10):
            time.sleep(1) # waits a while before getting latest lines
            print(self.current_chunk)
    finally:
        proc.terminate()
        try:
            proc.wait(timeout=0.2)
            print('== subprocess exited with rc =', proc.returncode)
        except subprocess.TimeoutExpired:
            print('subprocess did not terminate in time')
    t.join()

Here is another possible solution. It is a program that you would run as a separate process in the pipeline, which presents a REST API that when queried will return the last N lines that it read on stdin (where N and the port number are supplied on stdin). It is using run in flask so should not be used in situations where the outside world has access to the local server port to make requests, though this could be adapted.

import sys
import time
import threading
import argparse
from flask import Flask, request
from flask_restful import Resource, Api


class Server:

    def __init__(self):
        self.data = {'at_eof': False,
                     'lines_read': 0,
                     'latest_lines': []}
        self.thread = None
        self.args = None
        self.stop = False


    def parse_args(self):
        parser = argparse.ArgumentParser()
        parser.add_argument("num_lines", type=int,
                            help="number of lines to cache")
        parser.add_argument("port", type=int,
                            help="port to serve on")
        self.args = parser.parse_args()


    def start_updater(self):
        def updater():
            lines = self.data['latest_lines']
            while True:
                if self.stop:
                    return
                line = sys.stdin.readline()
                if not line:
                    break
                self.data['lines_read'] += 1
                lines.append(line)
                while len(lines) > self.args.num_lines:
                    lines.pop(0)
            self.data['at_eof'] = True
        self.thread = threading.Thread(target=updater)
        self.thread.start()


    def get_data(self):
        return self.data


    def shutdown(self):
        self.stop = True
        func = request.environ.get('werkzeug.server.shutdown')
        if func:
            func()
            return 'Shutting down'
        else:
            return 'shutdown failed'


    def add_apis(self, app):

        class GetData(Resource):
            get = self.get_data

        class Shutdown(Resource):
            get = self.shutdown            

        api = Api(app)
        api.add_resource(GetData, "/getdata")
        api.add_resource(Shutdown, "/shutdown")


    def run(self):
        self.parse_args()
        self.start_updater()        
        app = Flask(__name__)
        self.add_apis(app)
        app.run(port=self.args.port)


server = Server()
server.run()

Example usage: here is a test program whose output we want to serve:

import sys
import time

for i in range(100):
    print("this is line {}".format(i))
    sys.stdout.flush()
    time.sleep(.1)

And a simple pipeline to launch it (here from the linux shell prompt but could be done via subprocess.Popen ), serving the last 5 lines, on port 8001:

python ./writer.py  | python ./server.py 5 8001

An example query, here using curl as the client but it could be done via Python requests :

$ curl -s http://localhost:8001/getdata
{"at_eof": false, "lines_read": 30, "latest_lines": ["this is line 25\n", "this is line 26\n", "this is line 27\n", "this is line 28\n", "this is line 29\n"]}

The server also provides an http://localhost:<port>/shutdown URL to terminate it, though if you call it before you first see "at_eof": true , then expect the writer to die with a broken pipe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM