简体   繁体   English

使用Python 3中的io.BufferedReader快速读取gzip(文本文件)

[英]Fast reading of gzip (text file) using io.BufferedReader in Python 3

I'm trying to efficiently read in, and parse, a compressed text file using the gzip module. 我正在尝试使用gzip模块有效地读入和解析压缩文本文件。 This link suggests wrapping the gzip file object with io.BufferedReader , like so: 链接建议使用io.BufferedReader包装gzip文件对象,如下所示:

import gzip, io
gz = gzip.open(in_path, 'rb')
f = io.BufferedReader(gz)
     for line in f.readlines():
         # do stuff
gz.close()

To do this in Python 3, I think gzip must be called with mode='rb' . 要在Python 3中执行此操作,我认为必须使用mode='rb'调用gzip So the result is that line is a binary string. 结果是该line是二进制字符串。 However, I need line to be a text/ascii string. 但是,我需要line作为text / ascii字符串。 Is there a more efficient way to read in the file as a text string using BufferedReader , or will I have to decode line inside the for loop? 有没有更有效的方法使用BufferedReader将文件作为文本字符串读取,或者我必须在for循环内解码line

You can use io.TextIOWrapper to seamlessly wrap a binary stream to a text stream instead: 您可以使用io.TextIOWrapper将二进制流无缝地包装到文本流中:

f = io.TextIOWrapper(gz)

Or as @ShadowRanger pointed out, you can simply open the gzip file in text mode instead, so that the gzip module will apply the io.TextIOWrapper wrapper for you: 或者正如@ShadowRanger指出的那样,您只需在文本模式下打开gzip文件,这样gzip模块就会为您应用io.TextIOWrapper包装:

for line in gzip.open(in_path, 'rt'):
    # do stuff

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - 使用 _io.BufferedReader 获取 TypeError - Python - Getting TypeError with _io.BufferedReader ResourceWarning:未关闭的文件 &lt;_io.BufferedReader name=4&gt; - ResourceWarning: unclosed file <_io.BufferedReader name=4> 在使用open()获得的流上使用io.BufferedReader? - Using io.BufferedReader on a stream obtained with open()? 尝试在 Python 中使用 matplotlib 保存图形动画 - “无效的文件对象:&lt;_io.BufferedReader name=76&gt;” - Trying to save an animated of graph with matplotlib in Python - "Invalid file object: <_io.BufferedReader name=76>" python-gnupg的verify_file不喜欢我将它传递给_io.BufferedReader - python-gnupg's verify_file doesn't like me passing it an _io.BufferedReader Python3:Reportlab图像 - ResourceWarning:未闭合文件&lt;_io.BufferedReader name = ...&gt; - Python3: Reportlab Image - ResourceWarning: unclosed file <_io.BufferedReader name=…> io.BufferedReader peek函数返回缓冲区中的所有文本 - io.BufferedReader peek function returning all the text in the buffer 在Python2中从sys.stdin创建io.BufferedReader - Making io.BufferedReader from sys.stdin in Python2 如何通过 AWS Lambda(python) 返回 io.BufferedReader? - How to return io.BufferedReader through AWS Lambda(python)? Python-如何在io.BufferedReader中使用自定义buffer_size? - Python - How to use custom buffer_size in io.BufferedReader?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM