简体   繁体   English

为什么使用subprocess.Popen()调用grep要比使用subprocess.check_output()快得多

[英]Why is calling grep using subprocess.Popen() significantly faster than using subprocess.check_output()

I need to extract rows of entries in a csv-like file, and I'm using grep to do it inside a python script. 我需要在类似csv的文件中提取条目行,我正在使用grep在python脚本中执行它。 I noticed that when I call grep using subprocess.check_output, it takes around 5.28 seconds to finish. 我注意到当我使用subprocess.check_output调用grep时,完成需要大约5.28秒。 But when I use subprocess.Popen, it only takes 0.002 seconds. 但是当我使用subprocess.Popen时,它只需要0.002秒。 That seems to be a massive difference and I'm wondering which one I should use. 这似乎是一个巨大的差异,我想知道我应该使用哪一个。 Should note that I intend to process each line as a string. 应该注意,我打算将每一行作为字符串处理。

Here's part of my python script. 这是我的python脚本的一部分。

myenv = os.environ.copy()
myenv['LC_ALL'] = 'C'
file = data_path+'/'+files[12]
start = time.time()
match = 'chr3' + "[[:space:]]"
matched_reads = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)
mathced_reads = str(matched_reads).splitlines()
end = time.time()
runtime = end-start
print("Popen Grep: ", runtime)

start = time.time()
match = 'chr3' + "[[:space:]]"
matched_reads = subprocess.check_output(['grep', match, file],env=myenv)
mathced_reads = str(matched_reads).splitlines()
end = time.time()
runtime = end-start
print("Checkoutput Grep: ", runtime)

You will find that calling Popen does not actually execute the program and return the output, but rather it constructs a object that references the created process. 您会发现调用Popen实际上并不执行程序并返回输出,而是构造一个引用已创建进程的对象。 In your case, you didn't call Popen.communicate which "talks" to the the process and capture its output complete. 在您的情况下,您没有调用Popen.communicate与该进程“对话”并捕获其输出完成。 Whereas check_output does all that for you. check_output会为您完成所有这些。 You will find that the communicate method will take about as long, but will actually return the desired output. 你会发现communicate方法需要很长时间,但实际上会返回所需的输出。

For the actual demonstration with POpen , replace 对于POpen的实际演示,请更换

matched_reads = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)

with

process = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)
matched_reads, stderr = process.communicate()

Which should replicate the same behaviour as check_output to have matched_reads contain the output produced by grep . 哪个应该复制与check_output相同的行为,以使matched_reads包含grep生成的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM