为什么使用subprocess.Popen（）调用grep要比使用subprocess.check_output（）快得多

Question

I need to extract rows of entries in a csv-like file, and I'm using grep to do it inside a python script. 我需要在类似csv的文件中提取条目行，我正在使用grep在python脚本中执行它。 I noticed that when I call grep using subprocess.check_output, it takes around 5.28 seconds to finish. 我注意到当我使用subprocess.check_output调用grep时，完成需要大约5.28秒。 But when I use subprocess.Popen, it only takes 0.002 seconds. 但是当我使用subprocess.Popen时，它只需要0.002秒。 That seems to be a massive difference and I'm wondering which one I should use. 这似乎是一个巨大的差异，我想知道我应该使用哪一个。 Should note that I intend to process each line as a string. 应该注意，我打算将每一行作为字符串处理。

Here's part of my python script. 这是我的python脚本的一部分。

myenv = os.environ.copy()
myenv['LC_ALL'] = 'C'
file = data_path+'/'+files[12]
start = time.time()
match = 'chr3' + "[[:space:]]"
matched_reads = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)
mathced_reads = str(matched_reads).splitlines()
end = time.time()
runtime = end-start
print("Popen Grep: ", runtime)

start = time.time()
match = 'chr3' + "[[:space:]]"
matched_reads = subprocess.check_output(['grep', match, file],env=myenv)
mathced_reads = str(matched_reads).splitlines()
end = time.time()
runtime = end-start
print("Checkoutput Grep: ", runtime)

Answer 1

You will find that calling Popen does not actually execute the program and return the output, but rather it constructs a object that references the created process. 您会发现调用Popen实际上并不执行程序并返回输出，而是构造一个引用已创建进程的对象。 In your case, you didn't call Popen.communicate which "talks" to the the process and capture its output complete. 在您的情况下，您没有调用Popen.communicate与该进程“对话”并捕获其输出完成。 Whereas check_output does all that for you. 而check_output会为您完成所有这些。 You will find that the communicate method will take about as long, but will actually return the desired output. 你会发现communicate方法需要很长时间，但实际上会返回所需的输出。

For the actual demonstration with POpen , replace 对于POpen的实际演示，请更换

matched_reads = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)

with 同

process = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)
matched_reads, stderr = process.communicate()

Which should replicate the same behaviour as check_output to have matched_reads contain the output produced by grep . 哪个应该复制与check_output相同的行为，以使matched_reads包含grep生成的输出。

为什么使用subprocess.Popen（）调用grep要比使用subprocess.check_output（）快得多

问题描述

1 个解决方案

解决方案1
0 2019-03-30 00:30:47

为什么使用subprocess.Popen（）调用grep要比使用subprocess.check_output（）快得多

问题描述

1 个解决方案

解决方案1 0 2019-03-30 00:30:47

解决方案1
0 2019-03-30 00:30:47