[英]Why is calling grep using subprocess.Popen() significantly faster than using subprocess.check_output()
I need to extract rows of entries in a csv-like file, and I'm using grep to do it inside a python script. 我需要在类似csv的文件中提取条目行,我正在使用grep在python脚本中执行它。 I noticed that when I call grep using subprocess.check_output, it takes around 5.28 seconds to finish.
我注意到当我使用subprocess.check_output调用grep时,完成需要大约5.28秒。 But when I use subprocess.Popen, it only takes 0.002 seconds.
但是当我使用subprocess.Popen时,它只需要0.002秒。 That seems to be a massive difference and I'm wondering which one I should use.
这似乎是一个巨大的差异,我想知道我应该使用哪一个。 Should note that I intend to process each line as a string.
应该注意,我打算将每一行作为字符串处理。
Here's part of my python script. 这是我的python脚本的一部分。
myenv = os.environ.copy()
myenv['LC_ALL'] = 'C'
file = data_path+'/'+files[12]
start = time.time()
match = 'chr3' + "[[:space:]]"
matched_reads = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)
mathced_reads = str(matched_reads).splitlines()
end = time.time()
runtime = end-start
print("Popen Grep: ", runtime)
start = time.time()
match = 'chr3' + "[[:space:]]"
matched_reads = subprocess.check_output(['grep', match, file],env=myenv)
mathced_reads = str(matched_reads).splitlines()
end = time.time()
runtime = end-start
print("Checkoutput Grep: ", runtime)
You will find that calling Popen
does not actually execute the program and return the output, but rather it constructs a object that references the created process. 您会发现调用
Popen
实际上并不执行程序并返回输出,而是构造一个引用已创建进程的对象。 In your case, you didn't call Popen.communicate
which "talks" to the the process and capture its output complete. 在您的情况下,您没有调用
Popen.communicate
与该进程“对话”并捕获其输出完成。 Whereas check_output
does all that for you. 而
check_output
会为您完成所有这些。 You will find that the communicate
method will take about as long, but will actually return the desired output. 你会发现
communicate
方法需要很长时间,但实际上会返回所需的输出。
For the actual demonstration with POpen
, replace 对于
POpen
的实际演示,请更换
matched_reads = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)
with 同
process = subprocess.Popen(['grep', match, file], stdout=subprocess.PIPE, env=myenv)
matched_reads, stderr = process.communicate()
Which should replicate the same behaviour as check_output
to have matched_reads
contain the output produced by grep
. 哪个应该复制与
check_output
相同的行为,以使matched_reads
包含grep
生成的输出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.