简体   繁体   English

使用 pandas - python 导入 csv 文件时出错

[英]Error whilst importing csv file using pandas - python

I am attempting to read, then encode items from a csv file, using pandas.我正在尝试使用 pandas 读取 csv 文件中的项目,然后对其进行编码。

Here is my code:这是我的代码:

import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import linear_model, preprocessing

data = pd.read_csv("car.data")  # import in data
print(data.head())  # show the top few lines of data

le = preprocessing.LabelEncoder()  # object to change data into a numerical value
buying = le.fit_transform(list(data["buying"]))  # input buying column into object le
maint = le.fit_transform(list(data["maint"]))  # input maint column into object le
door = le.fit_transform(list(data["door"]))  # input door column into object le
persons = le.fit_transform(list(data["persons"]))  # input persons column into object le
lug_boot = le.fit_transform(list(data["lug_boot"]))  # input lug_boot column into object le
safety = le.fit_transform(list(data["safety"]))  # input safety column into object le
cls = le.fit_transform(list(data["class"]))  # input class column into object le

predict = "class"  # what will be predicted

x = list(zip(buying, maint, door, persons, lug_boot, safety))  # will put all of the values into one list (x)
y = list(cls)  # will convert numpy array (cls) into list

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size = 0.1)  # create new data so the machine can't memorise results

print(x_train, y_test)  # show variables to test its working

And the first few lines of my car.data file还有我的car.data文件的前几行

buying, maint, door, persons, lug_boot, safety, class
vhigh,vhigh,2,2,small,low,unacc
vhigh,vhigh,2,2,small,med,unacc
vhigh,vhigh,2,2,small,high,unacc
vhigh,vhigh,2,2,med,low,unacc
vhigh,vhigh,2,2,med,med,unacc
vhigh,vhigh,2,2,med,high,unacc
vhigh,vhigh,2,2,big,low,unacc
vhigh,vhigh,2,2,big,med,unacc
vhigh,vhigh,2,2,big,high,unacc

I think I am doing everything correctly, however I am getting the following error:我认为我做的一切都是正确的,但是我收到以下错误:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/tensor/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'maint'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/name/PycharmProjects/Machine_learning/KNN/KNN Working File.py", line 13, in <module>
    maint = le.fit_transform(list(data["maint"]))  # input maint column into object le
  File "/opt/anaconda3/envs/tensor/lib/python3.6/site-packages/pandas/core/frame.py", line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/opt/anaconda3/envs/tensor/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    raise KeyError(key) from err
KeyError: 'maint'

I am most confused on why it has only given me an error on the maint variable but not the buying variable.我最困惑的是为什么它只给了我一个关于maint变量而不是buying变量的错误。 Please let me know what I am doing wrong as I am very confused.请让我知道我做错了什么,因为我很困惑。 Thanks.谢谢。

You have a leading space before 'maint', so your actual key should be ' maint'.你在'maint'之前有一个前导空格,所以你的实际键应该是'maint'。
Either fix the csv file, or flag skipinitialspace=True in pd.read_csv() :要么修复 csv 文件,要么在pd.read_csv()中标记skipinitialspace=True

data = pd.read_csv("car.data", skipinitialspace=True)

This work file at me end这个工作文件在我结束

In [2]: !cat a.csv
buying, maint, door, persons, lug_boot, safety, class
vhigh,vhigh,2,2,small,low,unacc
vhigh,vhigh,2,2,small,med,unacc
vhigh,vhigh,2,2,small,high,unacc
vhigh,vhigh,2,2,med,low,unacc
vhigh,vhigh,2,2,med,med,unacc
vhigh,vhigh,2,2,med,high,unacc
vhigh,vhigh,2,2,big,low,unacc
vhigh,vhigh,2,2,big,med,unacc
vhigh,vhigh,2,2,big,high,unacc


In [3]: pd.read_csv("a.csv")
Out[3]: 
  buying  maint   door   persons  lug_boot  safety  class
0  vhigh  vhigh      2         2     small     low  unacc
1  vhigh  vhigh      2         2     small     med  unacc
2  vhigh  vhigh      2         2     small    high  unacc
3  vhigh  vhigh      2         2       med     low  unacc
4  vhigh  vhigh      2         2       med     med  unacc
5  vhigh  vhigh      2         2       med    high  unacc
6  vhigh  vhigh      2         2       big     low  unacc
7  vhigh  vhigh      2         2       big     med  unacc
8  vhigh  vhigh      2         2       big    high  unacc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM