[英]Error whilst importing csv file using pandas - python
I am attempting to read, then encode items from a csv file, using pandas.我正在尝试使用 pandas 读取 csv 文件中的项目,然后对其进行编码。
Here is my code:这是我的代码:
import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import linear_model, preprocessing
data = pd.read_csv("car.data") # import in data
print(data.head()) # show the top few lines of data
le = preprocessing.LabelEncoder() # object to change data into a numerical value
buying = le.fit_transform(list(data["buying"])) # input buying column into object le
maint = le.fit_transform(list(data["maint"])) # input maint column into object le
door = le.fit_transform(list(data["door"])) # input door column into object le
persons = le.fit_transform(list(data["persons"])) # input persons column into object le
lug_boot = le.fit_transform(list(data["lug_boot"])) # input lug_boot column into object le
safety = le.fit_transform(list(data["safety"])) # input safety column into object le
cls = le.fit_transform(list(data["class"])) # input class column into object le
predict = "class" # what will be predicted
x = list(zip(buying, maint, door, persons, lug_boot, safety)) # will put all of the values into one list (x)
y = list(cls) # will convert numpy array (cls) into list
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size = 0.1) # create new data so the machine can't memorise results
print(x_train, y_test) # show variables to test its working
And the first few lines of my car.data
file还有我的
car.data
文件的前几行
buying, maint, door, persons, lug_boot, safety, class
vhigh,vhigh,2,2,small,low,unacc
vhigh,vhigh,2,2,small,med,unacc
vhigh,vhigh,2,2,small,high,unacc
vhigh,vhigh,2,2,med,low,unacc
vhigh,vhigh,2,2,med,med,unacc
vhigh,vhigh,2,2,med,high,unacc
vhigh,vhigh,2,2,big,low,unacc
vhigh,vhigh,2,2,big,med,unacc
vhigh,vhigh,2,2,big,high,unacc
I think I am doing everything correctly, however I am getting the following error:我认为我做的一切都是正确的,但是我收到以下错误:
Traceback (most recent call last):
File "/opt/anaconda3/envs/tensor/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'maint'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/name/PycharmProjects/Machine_learning/KNN/KNN Working File.py", line 13, in <module>
maint = le.fit_transform(list(data["maint"])) # input maint column into object le
File "/opt/anaconda3/envs/tensor/lib/python3.6/site-packages/pandas/core/frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda3/envs/tensor/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'maint'
I am most confused on why it has only given me an error on the maint
variable but not the buying
variable.我最困惑的是为什么它只给了我一个关于
maint
变量而不是buying
变量的错误。 Please let me know what I am doing wrong as I am very confused.请让我知道我做错了什么,因为我很困惑。 Thanks.
谢谢。
You have a leading space before 'maint', so your actual key should be ' maint'.你在'maint'之前有一个前导空格,所以你的实际键应该是'maint'。
Either fix the csv file, or flag skipinitialspace=True
in pd.read_csv()
:要么修复 csv 文件,要么在
pd.read_csv()
中标记skipinitialspace=True
:
data = pd.read_csv("car.data", skipinitialspace=True)
This work file at me end这个工作文件在我结束
In [2]: !cat a.csv
buying, maint, door, persons, lug_boot, safety, class
vhigh,vhigh,2,2,small,low,unacc
vhigh,vhigh,2,2,small,med,unacc
vhigh,vhigh,2,2,small,high,unacc
vhigh,vhigh,2,2,med,low,unacc
vhigh,vhigh,2,2,med,med,unacc
vhigh,vhigh,2,2,med,high,unacc
vhigh,vhigh,2,2,big,low,unacc
vhigh,vhigh,2,2,big,med,unacc
vhigh,vhigh,2,2,big,high,unacc
In [3]: pd.read_csv("a.csv")
Out[3]:
buying maint door persons lug_boot safety class
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc
5 vhigh vhigh 2 2 med high unacc
6 vhigh vhigh 2 2 big low unacc
7 vhigh vhigh 2 2 big med unacc
8 vhigh vhigh 2 2 big high unacc
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.