[英]Fast reading of gzip (text file) using io.BufferedReader in Python 3
I'm trying to efficiently read in, and parse, a compressed text file using the gzip module. 我正在尝试使用gzip模块有效地读入和解析压缩文本文件。 This link suggests wrapping the gzip file object with
io.BufferedReader
, like so: 此链接建议使用
io.BufferedReader
包装gzip文件对象,如下所示:
import gzip, io
gz = gzip.open(in_path, 'rb')
f = io.BufferedReader(gz)
for line in f.readlines():
# do stuff
gz.close()
To do this in Python 3, I think gzip
must be called with mode='rb'
. 要在Python 3中执行此操作,我认为必须使用
mode='rb'
调用gzip
。 So the result is that line
is a binary string. 结果是该
line
是二进制字符串。 However, I need line
to be a text/ascii string. 但是,我需要
line
作为text / ascii字符串。 Is there a more efficient way to read in the file as a text string using BufferedReader
, or will I have to decode line
inside the for loop? 有没有更有效的方法使用
BufferedReader
将文件作为文本字符串读取,或者我必须在for循环内解码line
?
You can use io.TextIOWrapper
to seamlessly wrap a binary stream to a text stream instead: 您可以使用
io.TextIOWrapper
将二进制流无缝地包装到文本流中:
f = io.TextIOWrapper(gz)
Or as @ShadowRanger pointed out, you can simply open the gzip file in text mode instead, so that the gzip
module will apply the io.TextIOWrapper
wrapper for you: 或者正如@ShadowRanger指出的那样,您只需在文本模式下打开gzip文件,这样
gzip
模块就会为您应用io.TextIOWrapper
包装:
for line in gzip.open(in_path, 'rt'):
# do stuff
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.