[英]Why is subprocess.run output different from shell output of same command?
I am using subprocess.run()
for some automated testing. 我正在使用
subprocess.run()
进行一些自动化测试。 Mostly to automate doing: 主要是自动执行:
dummy.exe < file.txt > foo.txt
diff file.txt foo.txt
If you execute the above redirection in a shell, the two files are always identical. 如果在shell中执行上述重定向,则这两个文件始终相同。 But whenever
file.txt
is too long, the below Python code does not return the correct result. 但是每当
file.txt
太长时,下面的Python代码都不会返回正确的结果。
This is the Python code: 这是Python代码:
import subprocess
import sys
def main(argv):
exe_path = r'dummy.exe'
file_path = r'file.txt'
with open(file_path, 'r') as test_file:
stdin = test_file.read().strip()
p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE, universal_newlines=True)
out = p.stdout.strip()
err = p.stderr
if stdin == out:
print('OK')
else:
print('failed: ' + out)
if __name__ == "__main__":
main(sys.argv[1:])
Here is the C++ code in dummy.cc
: 这是
dummy.cc
的C ++代码:
#include <iostream>
int main()
{
int size, count, a, b;
std::cin >> size;
std::cin >> count;
std::cout << size << " " << count << std::endl;
for (int i = 0; i < count; ++i)
{
std::cin >> a >> b;
std::cout << a << " " << b << std::endl;
}
}
file.txt
can be anything like this: file.txt
可以是这样的:
1 100000
0 417
0 842
0 919
...
The second integer on the first line is the number of lines following, hence here file.txt
will be 100,001 lines long. 第一行的第二个整数是后面的行数,因此这里
file.txt
长度为100,001行。
Question: Am I misusing subprocess.run() ? 问题:我是否误用了subprocess.run()?
Edit 编辑
My exact Python code after comment (newlines,rb) is taken into account: 评论后我的确切Python代码(换行符,rb)被考虑在内:
import subprocess
import sys
import os
def main(argv):
base_dir = os.path.dirname(__file__)
exe_path = os.path.join(base_dir, 'dummy.exe')
file_path = os.path.join(base_dir, 'infile.txt')
out_path = os.path.join(base_dir, 'outfile.txt')
with open(file_path, 'rb') as test_file:
stdin = test_file.read().strip()
p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE)
out = p.stdout.strip()
if stdin == out:
print('OK')
else:
with open(out_path, "wb") as text_file:
text_file.write(out)
if __name__ == "__main__":
main(sys.argv[1:])
Here is the first diff: 这是第一个差异:
Here is the input file: https://drive.google.com/open?id=0B--mU_EsNUGTR3VKaktvQVNtLTQ 以下是输入文件: https : //drive.google.com/open?id = 0B--mU_EsNUGTR3VKaktvQVNtLTQ
To reproduce, the shell command: 要重现,shell命令:
subprocess.run("dummy.exe < file.txt > foo.txt", shell=True, check=True)
without the shell in Python: 没有Python中的shell:
with open('file.txt', 'rb', 0) as input_file, \
open('foo.txt', 'wb', 0) as output_file:
subprocess.run(["dummy.exe"], stdin=input_file, stdout=output_file, check=True)
It works with arbitrary large files. 它适用于任意大文件。
You could use subprocess.check_call()
in this case (available since Python 2), instead of subprocess.run()
that is available only in Python 3.5+. 您可以在这种情况下使用
subprocess.check_call()
(自Python 2起可用),而不是仅在Python 3.5+中可用的subprocess.run()
。
Works very well thanks.
非常好,谢谢。 But then why was the original failing ?
但那么为什么原来失败了呢? Pipe buffer size as in Kevin Answer ?
管道缓冲区大小与Kevin Answer一样?
It has nothing to do with OS pipe buffers. 它与OS管道缓冲区无关。 The warning from the subprocess docs that @Kevin J. Chase cites is unrelated to
subprocess.run()
. 来自@Kevin J. Chase引用的子流程文档的警告与subprocess.run
subprocess.run()
无关。 You should care about OS pipe buffers only if you use process = Popen()
and manually read()/write() via multiple pipe streams ( process.stdin/.stdout/.stderr
). 只有在使用
process = Popen()
并通过多个管道流( process.stdin/.stdout/.stderr
) 手动读取()/ write()时,才应该关心OS管道缓冲区。
It turns out that the observed behavior is due to Windows bug in the Universal CRT . 事实证明,观察到的行为是由于Universal CRT中的Windows错误造成的。 Here's the same issue that is reproduced without Python: Why would redirection work where piping fails?
这是在没有Python的情况下重现的相同问题: 为什么重定向会在管道失效的地方工作?
As said in the bug description , to workaround it: 如错误描述中所述,要解决它:
ReadFile()
directly instead of std::cin
ReadFile()
而不是std::cin
g++
on Windows g++
,则没有问题 The bug affects only text pipes ie, the code that uses <>
should be fine ( stdin=input_file, stdout=output_file
should still work or it is some other bug). 该bug只影响文本管道,即使用
<>
的代码应该没问题( stdin=input_file, stdout=output_file
应该仍然有效,或者是其他一些bug)。
I'll start with a disclaimer: I don't have Python 3.5 (so I can't use the run
function), and I wasn't able to reproduce your problem on Windows (Python 3.4.4) or Linux (3.1.6). 我将从免责声明开始:我没有Python 3.5(所以我不能使用
run
函数),而且我无法在Windows(Python 3.4.4)或Linux(3.1。 6)。 That said... 那说......
subprocess.PIPE
and Family subprocess.PIPE
和Family的问题 The subprocess.run
docs say that it's just a front-end for the old subprocess.Popen
-and- communicate()
technique. subprocess.run
文档说它只是旧subprocess.Popen
--and- communicate()
技术的前端。 The subprocess.Popen.communicate
docs warn that: subprocess.Popen.communicate
文档警告:
The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
读取的数据缓冲在内存中,因此如果数据大小很大或不受限制,请不要使用此方法。
This sure sounds like your problem. 这肯定听起来像你的问题。 Unfortunately, the docs don't say how much data is "large", nor what will happen after "too much" data is read.
不幸的是,文档没有说明“数据量”是多少,也不会说“读取太多”数据后会发生什么 。 Just "don't do that, then".
只是“不要那样做,然后”。
The docs for subprocess.call
go into a little more detail (emphasis mine)... 对于文档
subprocess.call
进入更详细一点(重点煤矿)...
Do not use
stdout=PIPE
orstderr=PIPE
with this function.不要在此函数中使用
stdout=PIPE
或stderr=PIPE
。 The child process will block if it generates enough output to a pipe to fill up the OS pipe buffer as the pipes are not being read from.子进程将阻塞它是否为管道生成足够的输出以填充OS管道缓冲区,因为没有读取管道。
...as do the docs for subprocess.Popen.wait
: ...和
subprocess.Popen.wait
的文档一样:
This will deadlock when using
stdout=PIPE
orstderr=PIPE
and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data.当使用
stdout=PIPE
或stderr=PIPE
时,这将导致死锁,并且子进程会为管道生成足够的输出,以便阻止等待OS管道缓冲区接受更多数据。 UsePopen.communicate()
when using pipes to avoid that.使用管道时使用
Popen.communicate()
来避免这种情况。
That sure sounds like Popen.communicate
is the solution to this problem, but communicate
's own docs say "do not use this method if the data size is large" --- exactly the situation where the wait
docs tell you to use communicate
. 这肯定听起来像
Popen.communicate
是解决这个问题,但是communicate
自己的文档说---确切位置的情况,‘如果数据量很大不使用这种方法’ wait
文档告诉你使用 communicate
。 (Maybe it "avoid(s) that" by silently dropping data on the floor?) (也许它通过静默地丢弃地板上的数据来“避免”)
Frustratingly, I don't see any way to use a subprocess.PIPE
safely, unless you're sure you can read from it faster than your child process writes to it. 令人沮丧的是,我没有看到任何方法安全地使用
subprocess.PIPE
,除非你确定你可以比你的子进程写入它更快地读取它。
On that note... 就此而言......
tempfile.TemporaryFile
tempfile.TemporaryFile
You're holding all your data in memory... twice, in fact. 事实上,你将所有数据都保存在内存中......两次。 That can't be efficient, especially if it's already in a file.
这可能效率不高,特别是如果它已经存在于文件中。
If you're allowed to use a temporary file, you can compare the two files very easily, one line at a time. 如果您被允许使用临时文件,则可以非常轻松地比较这两个文件,一次一行。 This avoids all the
subprocess.PIPE
mess, and it's much faster, because it only uses a little bit of RAM at a time. 这样可以避免所有
subprocess.PIPE
混乱,而且速度更快,因为它一次只使用一点RAM。 (The IO from your subprocess might be faster, too, depending on how your operating system handles output redirection.) (来自子进程的IO也可能更快,具体取决于操作系统处理输出重定向的方式。)
Again, I can't test run
, so here's a slightly older Popen
-and- communicate
solution (minus main
and the rest of your setup): 再一次,我无法测试
run
,所以这里是一个稍微旧的Popen
和communicate
解决方案(减去main
和你的其他设置):
import io
import subprocess
import tempfile
def are_text_files_equal(file0, file1):
'''
Both files must be opened in "update" mode ('+' character), so
they can be rewound to their beginnings. Both files will be read
until just past the first differing line, or to the end of the
files if no differences were encountered.
'''
file0.seek(io.SEEK_SET)
file1.seek(io.SEEK_SET)
for line0, line1 in zip(file0, file1):
if line0 != line1:
return False
# Both files were identical to this point. See if either file
# has more data.
next0 = next(file0, '')
next1 = next(file1, '')
if next0 or next1:
return False
return True
def compare_subprocess_output(exe_path, input_path):
with tempfile.TemporaryFile(mode='w+t', encoding='utf8') as temp_file:
with open(input_path, 'r+t') as input_file:
p = subprocess.Popen(
[exe_path],
stdin=input_file,
stdout=temp_file, # No more PIPE.
stderr=subprocess.PIPE, # <sigh>
universal_newlines=True,
)
err = p.communicate()[1] # No need to store output.
# Compare input and output files... This must be inside
# the `with` block, or the TemporaryFile will close before
# we can use it.
if are_text_files_equal(temp_file, input_file):
print('OK')
else:
print('Failed: ' + str(err))
return
Unfortunately, since I can't reproduce your problem, even with a million-line input, I can't tell if this works . 不幸的是,由于我无法重现你的问题,即使有百万行输入,我也不知道这是否有效 。 If nothing else, it ought to give you wrong answers faster.
如果不出意外,它应该更快地给你错误的答案。
If you want to keep the output of your test run in foo.txt
(from your command-line example), then you would direct your subprocess' output to a normal file instead of a TemporaryFile
. 如果要将测试运行的输出保存在
foo.txt
(从命令行示例中),则可以将子进程的输出定向到普通文件而不是TemporaryFile
。 This is the solution recommended in JF Sebastian's answer . 这是JF Sebastian的回答中推荐的解决方案。
I can't tell from your question if you wanted foo.txt
, or if it was just a side-effect of the two step test-then- diff
--- your command-line example saves test output to a file, while your Python script doesn't. 我不能从你的问题中判断你是否想要
foo.txt
,或者它只是两步测试的副作用 - 然后 - diff
- 你的命令行示例将测试输出保存到文件中,而你的Python脚本没有。 Saving the output would be handy if you ever want to investigate a test failure, but it requires coming up with a unique filename for each test you run, so they don't overwrite each other's output. 如果您想要调查测试失败,保存输出会很方便,但它需要为您运行的每个测试提供唯一的文件名,因此它们不会覆盖彼此的输出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.