写入stdout和logfile时，unix tee的Python子进程调用会截断stdin

Question

I am trying to run a chain of existing scripts in python using subprocess. 我正在尝试使用子进程在python中运行一系列现有脚本。 The chain works as expected when I use this code: 使用此代码时，链条按预期工作：

p1 = subprocess.Popen(samtoolsSortArguments, stdout=subprocess.PIPE)
p2 = subprocess.Popen(samtoolsViewArguments, stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close()
p3 = subprocess.Popen(htseqCountArguments, stdin=p2.stdout, stdout=file_out)
p2.stdout.close()
p3.communicate()
file_out.close()

The output looks like this: 输出看起来像这样：

100000 GFF lines processed.
[bam_sort_core] merging from 2 files...
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2300000 GFF lines processed.
2400000 GFF lines processed.
2500000 GFF lines processed.
2600000 GFF lines processed.
2700000 GFF lines processed.
2764635 GFF lines processed.
100000 SAM alignment records processed.
200000 SAM alignment records processed.
300000 SAM alignment records processed.
400000 SAM alignment records processed.
500000 SAM alignment records processed.
600000 SAM alignment records processed.
700000 SAM alignment records processed.
800000 SAM alignment records processed.
900000 SAM alignment records processed.
1000000 SAM alignment records processed.
1100000 SAM alignment records processed.
1200000 SAM alignment records processed.
1300000 SAM alignment records processed.
1400000 SAM alignment records processed.
1500000 SAM alignment records processed.
1600000 SAM alignment records processed.
1700000 SAM alignment records processed.
1800000 SAM alignment records processed.
1900000 SAM alignment records processed.
2000000 SAM alignment records processed.
2100000 SAM alignment records processed.
2200000 SAM alignment records processed.
2300000 SAM alignment records processed.
2400000 SAM alignment records processed.
2500000 SAM alignment records processed.
2600000 SAM alignment records processed.
2700000 SAM alignment records processed.
2800000 SAM alignment records processed.
2900000 SAM alignment records processed.

All of this output is coming from stderr and I'd like to be able to write it to both the terminal and a logfile. 所有这些输出均来自stderr，我希望能够将其写入终端和日志文件。 In order to accomplish this I am using the unix tee command as a subprocess in python and passing it stderr from the previous subprocess command. 为了完成此任务，我将unix tee命令用作python中的子进程，并将其从先前的子进程命令传递给stderr。 The code looks like this: 代码如下：

p1 = subprocess.Popen(samtoolsSortArguments, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
tee = subprocess.Popen(['tee', logfile], stdin=p1.stderr)
p1.stderr.close()

p2 = subprocess.Popen(samtoolsViewArguments, stdin=p1.stdout, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p1.stdout.close()
tee = subprocess.Popen(['tee', logfile], stdin=p2.stderr)
p2.stderr.close()

p3 = subprocess.Popen(htseqCountArguments, stdin=p2.stdout, stdout=file_out, stderr=subprocess.PIPE)
p2.stdout.close()
tee = subprocess.Popen(['tee', logfile], stdin=p3.stderr)

p3.communicate()
p3.stderr.close()
tee.communicate()
file_out.close()

The stdout output from this code that is written to my file_out handle is correct. 此代码的stdout输出写入我的file_out句柄是正确的。 Even the stderr being printed to the screen and logfile seem to be the correct information. 即使将stderr打印到屏幕和日志文件中，也似乎是正确的信息。 However, the output for stderr is truncated on some lines and I can't figure out why. 但是，stderr的输出在某些行上被截断了，我不知道为什么。 Here's what my logfile and terminal look like (they match): 这是我的日志文件和终端的样子（它们匹配）：

 GFF lines processed.
[bam_sort_core] merging from 2 files...
 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
 GFF lines processed.
GFF lines processed.
FF lines processed.
F lines processed.
 lines processed.
ines processed.
700000 GFF lines processed.
2764635 GFF lines processed.
nt records processed.
 records processed.
300000 SAM alignment records processed.
cords processed.
ds processed.
processed.
essed.
d.
000000 SAM alignment records processed.
00 SAM alignment records processed.
 alignment records processed.
1500000 SAM alignment records processed.
1600000 SAM alignment records processed.
1800000 SAM alignment records processed.
1900000 SAM alignment records processed.
2000000 SAM alignment records processed.
2100000 SAM alignment records processed.
2200000 SAM alignment records processed.
2500000 SAM alignment records processed.
2600000 SAM alignment records processed.
2700000 SAM alignment records processed.
2900000 SAM alignment records processed.

Why is the output when passed to tee truncated? 为什么传递给tee的输出会被截断？ Is this just a column shift? 这只是列移位吗？ Is there a way to fix this, or am I just trying to do too much with subprocess? 有没有办法解决这个问题，还是我只是想对子流程做太多事情？

EDIT: Here's an SSCCE of @tdelaney code. 编辑：这是@tdelaney代码的SSCCE。 It reproduces the same error that I was having using it in my broader context. 它重现了与我在更广泛的上下文中使用它时相同的错误。 This example should be run from a folder containing a file called test.txt. 该示例应从包含名为test.txt的文件的文件夹中运行。 test.txt should read as follows (or anything similar so long as some lines are "test"): test.txt的内容应如下所示（或类似的内容，只要某些行是“ test”）：

test
not
test

And here's toy code (make sure to change the shebang to point to your python): 这是玩具代码（请确保将shebang更改为指向您的python）：

#!/usr/local/bin/python2

import sys
import subprocess
import threading

logfile = "./testlog.txt"

arg1 = ["ls", "-l"]
arg2 = ["find", "-name", "test.txt"]
arg3 = ["xargs", "grep", "-i", "-n", "test"]

def log_writer(pipe, log_fp, lock):
    for line in pipe:
        with lock:
            log_fp.write(line)
            sys.stdout.write(line)

with open(logfile, 'w') as log_fp:
    lock = threading.Lock()
    threads = []
    p1 = subprocess.Popen(arg1, stdout=subprocess.PIPE)
    threads.append(threading.Thread(target=log_writer, args=(p1.stdout, log_fp, lock)))

    p2 = subprocess.Popen(arg2, stdin=p1.stdout, stdout=subprocess.PIPE)
    p1.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p2.stdout, log_fp, lock)))

    p3 = subprocess.Popen(arg3, stdin=p2.stdout, stdout=subprocess.PIPE)
    p2.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p3.stdout, log_fp, lock)))

    for t in threads:
        t.start()

    p3.communicate()

    for t in threads:
        t.join()

Note: If I comment out the close() and communicate() lines the code runs. 注意：如果我注释掉close（）和communication（）行，则代码将运行。 I'm a little concerned about doing so though since then I'm going to hit all kinds of other problems in my broader context. 我有点担心这样做，但是从那时起，我将在更广泛的背景下解决所有其他问题。

Answer 1

The problem is that you have multiple tee s writing to a single file. 问题是您有多个tee写入单个文件。 They each have their own file pointer and current offset to the file and will overwrite bits of each others stuff. 它们每个都有自己的文件指针和文件的当前偏移量，并将覆盖彼此的位。 One solution is to implement the log file write using threads and a mutex in python. 一种解决方案是在python中使用线程和互斥量实现日志文件写入。

#!/bin/env python

import sys
import subprocess
import threading

logfile = "./testlog.txt"
file_out = open("./test.output.txt", "w")

arg1 = ["ls", "-l"]
arg2 = ["find", "-name", "test.txt"]
arg3 = ["xargs", "grep", "-i", "-n", "test"]

def log_writer(pipe, log_fp, lock):
    for line in pipe:
        with lock:
            log_fp.write(line)
            sys.stdout.write(line)

with open(logfile, 'w') as log_fp:
    lock = threading.Lock()
    threads = []
    processes = []
    p1 = subprocess.Popen(arg1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    threads.append(threading.Thread(target=log_writer, args=(p1.stderr, log_fp, lock)))
    processes.append(p1)

    p2 = subprocess.Popen(arg2, stdin=p1.stderr, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    p1.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p2.stderr, log_fp, lock)))
    processes.append(p2)

    p3 = subprocess.Popen(arg3, stdin=p2.stdout, stdout=file_out, stderr=subprocess.PIPE)
    p2.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p3.stderr, log_fp, lock)))
    processes.append(p3)

    file_out.close()

    for t in threads:
        t.start()

    for p in processes:
        p1.wait()

    for t in threads:
        t.join()

写入stdout和logfile时，unix tee的Python子进程调用会截断stdin

问题描述

1 个解决方案

解决方案1
1 2015-03-31 16:06:13

写入stdout和logfile时，unix tee的Python子进程调用会截断stdin

问题描述

1 个解决方案

解决方案1 1 2015-03-31 16:06:13

解决方案1
1 2015-03-31 16:06:13