大熊猫。 pd.read_csv KeyError：索引错误

Question

I'm working with CSV file with over million entries. 我正在处理具有超过一百万个条目的CSV文件。 I'm trying to read data for each candidate as a separate column. 我正在尝试将每个候选人的数据作为单独的列读取。 I was able to parse data for the first candidate but when I get to next candidate I'm getting error ['cand_nm'] not in index. 我能够解析第一个候选者的数据，但是当我找到下一个候选者时，我得到的错误['cand_nm']不在索引中。

Here is the link to file: Link to data file 这是文件的链接：数据文件的链接

%matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

First candidate 第一候选人

reader_bachmann = pd.read_csv('P00000001-ALL.csv', squeeze=True, low_memory=False, nrows=411 )
print(reader_bachmann)
cand_bachmann = reader_bachmann[["cand_nm"]] 
print(cand_bachmann)

Second candidate 第二候选人

reader_romney = pd.read_csv('P00000001-ALL.csv', skiprows=range(0,411) ,  na_filter =True, low_memory=False)
print(reader_romney)
cand_romney = reader_romney[["cand_nm"]] 
print(cand_romney)

Error message 错误信息

KeyError: "['cand_nm'] not in index"

Answer 1

When you use skip_rows like that you lose the header. 当您像这样使用skip_rows ，会丢失标头。 So you header for reader_romney is now row number 412. If this is the way you want to read the file you will need to store the header line to a list of strings and then pass that list as the names= kwarg. 因此，您reader_romney标头现在是行号412。如果这是您要读取文件的方式，则需要将标头行存储到字符串列表中，然后将该列表作为names= kwarg传递。 For example 例如

r_bachman = pd.read_csv('P00000001-ALL.csv', squeeze=True, low_memory=False,
                        nrows=411 )
cols = r_bachman.columns
r_romney = pd.read_csv('P00000001-ALL.csv', skiprows=range(0,411),
                       na_filter =True, low_memory=False, names=cols)

大熊猫。 pd.read_csv KeyError：索引错误

问题描述

First candidate 第一候选人

Second candidate 第二候选人

Error message 错误信息

1 个解决方案

解决方案1
2 已采纳 2017-04-18 18:07:05

大熊猫。 pd.read_csv KeyError：索引错误

问题描述

First candidate 第一候选人

Second candidate 第二候选人

Error message 错误信息

1 个解决方案

解决方案1 2 已采纳 2017-04-18 18:07:05

解决方案1
2 已采纳 2017-04-18 18:07:05