简体   繁体   中英

Looking for an efficient way to combine lines in Python

I'm writing a program to aggregate strace output lines on a Linux host. When strace runs with the "-f" option it will intermix system calls line so:

close(255 <unfinished ...>
<... rt_sigprocmask resumed> NULL, 8) = 0
<... close resumed> )       = 0
[pid 19199] close(255 <unfinished ...>
[pid 19198] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 19199] <... close resumed> )       = 0

I would like to iterate through the output and combine "unfinished" lines with "resumed" lines. So in the output above the following two lines:

close(255 <unfinished ...>
.....
<... close resumed> )       = 0

Would be combined into:

close(255) = 0

I was thinking about splitting the "unfinished" lines at ">" and putting that into a list. If a future line contained resume I would iterate through this list to see if the system call and pid are present. If they are I would split() the line at ">" and combine the two. Curious if there is a better way to do this?

* Update *

Thanks for the awesome feedback! I came up with the following and would love to get your thoughts on the code:

holding_cell = list()

if len(sys.argv) > 1:
    strace_file =  open(sys.argv[1], "r")
else:
    strace_file = sys.stdin

for line in strace_file.read().splitlines():
    if "clone" in line:
        print line
    if "unfinished" in line:
        holding_cell.append(line.split("<")[0])
    elif "resumed" in line:
        # Get the name of the system call / pid so we  can try 
        # to match this line w/ one in the buffer
        identifier = line.split()[1]
        for cell in holding_cell:
            if identifier in cell:
                print cell + line.split(">")[1]
                holding_cell.remove(cell)
    else:
        print line

Is there a more pythonic way to write this? Thanks again for the awesome feedback!

Some iterators such as file objects can be nested. Assuming you are reading this from a file-like object, you could just create an inner loop to do the combining. I'm not sure what the formatting rules for strace logs are, but nominally, it could be something like

def get_logs(filename):
    with open('filename') as log:
        for line in log:
            if "<unfinished " in line:
                preamble = line.split(' ', 1)[0].strip()
                for line in log:
                    if " resumed>" in line:
                        yield "{}) = {}\n".format(preamble,
                            line.split('=')[-1].strip())
                        break
             else:
                 yield line

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM