简体   繁体   English

大熊猫。 pd.read_csv KeyError:索引错误

[英]Pandas. pd.read_csv KeyError:not in index error

I'm working with CSV file with over million entries. 我正在处理具有超过一百万个条目的CSV文件。 I'm trying to read data for each candidate as a separate column. 我正在尝试将每个候选人的数据作为单独的列读取。 I was able to parse data for the first candidate but when I get to next candidate I'm getting error ['cand_nm'] not in index. 我能够解析第一个候选者的数据,但是当我找到下一个候选者时,我得到的错误['cand_nm']不在索引中。

Here is the link to file: Link to data file 这是文件的链接: 数据文件的链接

%matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

First candidate 第一候选人

reader_bachmann = pd.read_csv('P00000001-ALL.csv', squeeze=True, low_memory=False, nrows=411 )
print(reader_bachmann)
cand_bachmann = reader_bachmann[["cand_nm"]] 
print(cand_bachmann)

Second candidate 第二候选人

reader_romney = pd.read_csv('P00000001-ALL.csv', skiprows=range(0,411) ,  na_filter =True, low_memory=False)
print(reader_romney)
cand_romney = reader_romney[["cand_nm"]] 
print(cand_romney)

Error message 错误信息

KeyError: "['cand_nm'] not in index"

When you use skip_rows like that you lose the header. 当您像这样使用skip_rows ,会丢失标头。 So you header for reader_romney is now row number 412. If this is the way you want to read the file you will need to store the header line to a list of strings and then pass that list as the names= kwarg. 因此,您reader_romney标头现在是行号412。如果这是您要读取文件的方式,则需要将标头行存储到字符串列表中,然后将该列表作为names= kwarg传递。 For example 例如

r_bachman = pd.read_csv('P00000001-ALL.csv', squeeze=True, low_memory=False,
                        nrows=411 )
cols = r_bachman.columns
r_romney = pd.read_csv('P00000001-ALL.csv', skiprows=range(0,411),
                       na_filter =True, low_memory=False, names=cols)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM