简体   繁体   English

将压缩的 txt 文件读取为 pandas dataframe

[英]Read zipped txt file as pandas dataframe

I am trying to read a zipped txt file as pandas dataframe.我正在尝试将压缩的 txt 文件读取为 pandas dataframe。 Though the format of file after unzipping is txt, but it contains comma separated values.虽然解压后的文件格式是txt,但是里面有逗号分隔的值。

Following the answer from here , I used:按照here的答案,我使用了:

path = 'data_folder/data.2020.ZIP'
df = pd.read_csv(path, compression='zip', header=None, sep=',')
print(df.head())

But it is throwing this error:但它抛出了这个错误:

ParserError: Error tokenizing data. ParserError:错误标记数据。 C error: Expected 37 fields in line 23, saw 80 C 错误:预计第 23 行中的 37 个字段,看到 80

I am using python 3.6 with pandas version 0.24.2.我正在使用 python 3.6 和 pandas 版本 0.24.2。 Would upgrading pandas help?升级 pandas 有帮助吗?

So this was happening because of irrregular number of columns in various rows, and since I don't want to drop any data, I used the names argument with maximum number of columns to fix the issue like so:所以发生这种情况是因为各行中的列数不规则,并且由于我不想删除任何数据,所以我使用具有最大列数的names参数来解决问题,如下所示:

path = 'data_folder/data.2020.ZIP'
df = pd.read_csv(path, compression='zip', header=None, sep=',', names=range(80))
print(df.head())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM