如何在Python 3.6中的tarfile对象上使用csv.DictReader？

Question

Here's the issue I'm running into: 这是我遇到的问题：

Error: iterator should return strings, not bytes (did you open the file in text mode?)

The code that's causing this looks something like: 导致这种情况的代码如下所示：

for fileinfo in tarfile.open(filename):
    f = t.extractfile(fileinfo)
    reader = csv.DictReader(f)
    reader.fieldnames

The trouble seems to be that the extractfile() method produces a io.BufferedReader that is a very basic file-like object and has no high-level text interface. 问题似乎在于extractfile()方法生成的io.BufferedReader是一个非常基本的类似于文件的对象，并且没有高级文本接口。

What would be a good way to handle this? 什么是处理此问题的好方法？

I'm thinking of looking at decoding the bytes from the reader into text but I need to retain streaming because these files are very large. 我正在考虑将阅读器中的字节解码为文本，但是我需要保留流，因为这些文件非常大。 The codebase is Python 3.6 running on Docker/Linux. 代码库是在Docker / Linux上运行的Python 3.6。

Answer 1

Thanks to both @Aran-Fey and @zwer who led me to another StackOverflow question that answered it. 感谢@ Aran-Fey和@zwer，他们将我引向另一个能够解决该问题的StackOverflow问题。 Here's how: 这是如何做：

for fileinfo in tarfile.open(filename):
    with t.extractfile(fileinfo) as f:
        ft = codecs.getreader("utf-8")(f)
        reader = csv.DictReader(ft)
        reader.fieldnames

This seems to work so far. 到目前为止，这似乎可行。

如何在Python 3.6中的tarfile对象上使用csv.DictReader？

问题描述

1 个解决方案

解决方案1
0 2018-10-02 21:34:13

如何在Python 3.6中的tarfile对象上使用csv.DictReader？

问题描述

1 个解决方案

解决方案1 0 2018-10-02 21:34:13

解决方案1
0 2018-10-02 21:34:13