python gzipped fileinput返回二进制字符串而不是文本字符串

Question

When I loop over the lines of a set of gzipped files with the module fileinput like this:当我使用模块 fileinput 遍历一组 gzipped 文件的行时，如下所示：

for line in fileinput.FileInput(files=gzipped_files,openhook=fileinput.hook_compressed):

Then those lines are byte strings and not text strings.那么这些行是字节字符串而不是文本字符串。

When using the module gzip this can be prevented by opening the files with 'rt' instead of 'rb': http://bugs.python.org/issue13989使用模块 gzip 时，可以通过使用“rt”而不是“rb”打开文件来防止这种情况： http : //bugs.python.org/issue13989

Is there a similar fix for the module fileinput, so I can have it return text strings instead of byte strings?模块文件输入是否有类似的修复，所以我可以让它返回文本字符串而不是字节字符串？ I tried adding mode='rt', but then I get this error:我尝试添加 mode='rt'，但随后出现此错误：

ValueError: FileInput opening mode must be one of 'r', 'rU', 'U' and 'rb'

Answer 1

You'd have to implement your own openhook function to open the files with a codec:您必须实现自己的openhook函数才能使用编解码器打开文件：

import os

def hook_compressed_text(filename, mode, encoding='utf8'):
    ext = os.path.splitext(filename)[1]
    if ext == '.gz':
        import gzip
        return gzip.open(filename, mode + 't', encoding=encoding)
    elif ext == '.bz2':
        import bz2
        return bz2.open(filename, mode + 't', encoding=encoding)
    else:
        return open(filename, mode, encoding=encoding)

Answer 2

Coming a bit late to the party, but wouldn't it be simpler to do this?参加聚会有点晚了，但这样做不是更简单吗？

for line in fileinput.FileInput(files=gzipped_files, openhook=fileinput.hook_compressed):
    if isinstance(line, bytes):
        line = line.decode()
    ...

python gzipped fileinput返回二进制字符串而不是文本字符串

问题描述

2 个解决方案

解决方案1
7 已采纳 2014-02-03 13:55:46

解决方案2
2 2019-12-09 07:45:26

python gzipped fileinput返回二进制字符串而不是文本字符串

问题描述

2 个解决方案

解决方案1 7 已采纳 2014-02-03 13:55:46

解决方案2 2 2019-12-09 07:45:26

解决方案1
7 已采纳 2014-02-03 13:55:46

解决方案2
2 2019-12-09 07:45:26