简体   繁体   English

在python 3中读取gzip压缩的csv文件

[英]reading gzipped csv file in python 3

I'm having problems reading from a gzipped csv file with the gzip and csv libs.我在读取带有gzipcsv库的 gzip csv 文件时遇到问题。 Here's what I got:这是我得到的:

import gzip
import csv
import json

f = gzip.open(filename)
csvobj = csv.reader(f,delimiter = ',',quotechar="'")
for line in csvobj:
            ts = line[0]
            data_json = json.loads(line[1])

but this throws an exception:但这会引发异常:

 File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 64, in download_from_S3
    self.parse_dump_file(filename)
  File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 30, in parse_dump_file
    for line in csvobj:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

gunzipping the file and opening that with csv works fine.压缩文件并用 csv 打开它工作正常。 I've also tried decoding the file text to convert from bytes to str...我也试过解码文件文本以从字节转换为字符串......

What am I missing here?我在这里缺少什么?

Default mode for gzip.open is rb , if you wish to work with strs, you have to specify it extra: gzip.open默认模式是rb ,如果你想使用 strs,你必须额外指定它:

f = gzip.open(filename, mode="rt")

OT: it is a good practice to write I/O operations in a with block: OT:在 with 块中编写 I/O 操作是一个很好的做法:

with gzip.open(filename, mode="rt") as f:

You are opening the file in binary mode ( which is the default for gzip ).您正在以二进制模式打开文件(这是gzip的默认设置)。

Try instead:试试吧:

import gzip
import csv
f = gzip.open(filename, mode='rt')
csvobj = csv.reader(f,delimiter = ',',quotechar="'")

too late, you can use datatable package in python太晚了,你可以在python中使用datatable包

import datatable as dt
df = dt.fread(filename)
df.head()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM