使用Python 3中的io.BufferedReader快速读取gzip（文本文件）

Question

I'm trying to efficiently read in, and parse, a compressed text file using the gzip module. 我正在尝试使用gzip模块有效地读入和解析压缩文本文件。 This link suggests wrapping the gzip file object with io.BufferedReader , like so: 此链接建议使用io.BufferedReader包装gzip文件对象，如下所示：

import gzip, io
gz = gzip.open(in_path, 'rb')
f = io.BufferedReader(gz)
     for line in f.readlines():
         # do stuff
gz.close()

To do this in Python 3, I think gzip must be called with mode='rb' . 要在Python 3中执行此操作，我认为必须使用mode='rb'调用gzip 。 So the result is that line is a binary string. 结果是该line是二进制字符串。 However, I need line to be a text/ascii string. 但是，我需要line作为text / ascii字符串。 Is there a more efficient way to read in the file as a text string using BufferedReader , or will I have to decode line inside the for loop? 有没有更有效的方法使用BufferedReader将文件作为文本字符串读取，或者我必须在for循环内解码line ？

Answer 1

You can use io.TextIOWrapper to seamlessly wrap a binary stream to a text stream instead: 您可以使用io.TextIOWrapper将二进制流无缝地包装到文本流中：

f = io.TextIOWrapper(gz)

Or as @ShadowRanger pointed out, you can simply open the gzip file in text mode instead, so that the gzip module will apply the io.TextIOWrapper wrapper for you: 或者正如@ShadowRanger指出的那样，您只需在文本模式下打开gzip文件，这样gzip模块就会为您应用io.TextIOWrapper包装：

for line in gzip.open(in_path, 'rt'):
    # do stuff

使用Python 3中的io.BufferedReader快速读取gzip（文本文件）

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-02-10 19:11:11

使用Python 3中的io.BufferedReader快速读取gzip（文本文件）

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-02-10 19:11:11

解决方案1
0 已采纳 2019-02-10 19:11:11