简体   繁体   English

如何在Python 3.6中的tarfile对象上使用csv.DictReader?

[英]How to use csv.DictReader on a tarfile object in Python 3.6?

Here's the issue I'm running into: 这是我遇到的问题:

Error: iterator should return strings, not bytes (did you open the file in text mode?)

The code that's causing this looks something like: 导致这种情况的代码如下所示:

for fileinfo in tarfile.open(filename):
    f = t.extractfile(fileinfo)
    reader = csv.DictReader(f)
    reader.fieldnames

The trouble seems to be that the extractfile() method produces a io.BufferedReader that is a very basic file-like object and has no high-level text interface. 问题似乎在于extractfile()方法生成的io.BufferedReader是一个非常基本的类似于文件的对象,并且没有高级文本接口。

What would be a good way to handle this? 什么是处理此问题的好方法?

I'm thinking of looking at decoding the bytes from the reader into text but I need to retain streaming because these files are very large. 我正在考虑将阅读器中的字节解码为文本,但是我需要保留流,因为这些文件非常大。 The codebase is Python 3.6 running on Docker/Linux. 代码库是在Docker / Linux上运行的Python 3.6。

Thanks to both @Aran-Fey and @zwer who led me to another StackOverflow question that answered it. 感谢@ Aran-Fey和@zwer,他们将我引向另一个能够解决该问题的StackOverflow问题 Here's how: 这是如何做:

for fileinfo in tarfile.open(filename):
    with t.extractfile(fileinfo) as f:
        ft = codecs.getreader("utf-8")(f)
        reader = csv.DictReader(ft)
        reader.fieldnames

This seems to work so far. 到目前为止,这似乎可行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM