[英]Shape mismatch: if categories is an array, it has to be of shape (n_features,)
Here is the code I'm trying to execute to encode the values of the first column of my data set using dummy values.这是我尝试执行的代码,以使用虚拟值对数据集第一列的值进行编码。
import numpy as py
import matplotlib.pyplot as plt
import pandas as pd
DataSet = pd.read_csv('Data.csv')
x=DataSet.iloc[:, :-1].values
y=DataSet.iloc[:,3].values
from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=py.nan,strategy='mean')
imputer=imputer.fit(x[:, 1:3])
x[:, 1:3]=imputer.transform(x[:, 1:3])
from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[0])
x=onehotencoder.fit_transform(x).toarray()
Here's the data I'm working on这是我正在处理的数据
France 44.0 72000.0
Spain 27.0 48000.0
Germany 30.0 54000.0
Spain 38.0 61000.0
Germany 40.0 63777.7
France 35.0 58000.0
Spain 38.777 52000.0
France 48.0 79000.0
Germany 50.0 83000.0
France 37.0 67000.0
I'm getting a error stating我收到一条错误消息
Shape mismatch: if categories is an array, it has to be of shape (n_features,).
Can anyone help me fix this?谁能帮我解决这个问题?
Your second doesn't seem to be a categorical features, you should only one_hot_encode features which can take a finite number of discrete value.您的第二个似乎不是分类特征,您应该只使用 one_hot_encode 可以采用有限数量离散值的特征。 Like the first column which can only take a limited number of value ('spain', 'germany', 'france') If you only encode de the first column you can do:就像第一列只能取有限数量的值(“西班牙”、“德国”、“法国”)如果你只对第一列进行编码,你可以这样做:
from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[['France','Germany','Spain']])
x_1=onehotencoder.fit_transform(x[:,0].reshape(-1, 1)).toarray()
x = np.concatenate([x_1,x[:,1:]], axis=1)
and then your data will be in the form:然后您的数据将采用以下形式:
France Germany Spain score
1 0 0 44.0
0 0 1 27.0
...
Also, You only have 3 columns on your data but you're calling the fourth column with y=DataSet.iloc[:,3].values (first column start at index 0 ->.iloc[:,3] should give 4th column, then.此外,您的数据只有 3 列,但您使用 y=DataSet.iloc[:,3].values 调用第四列(第一列从索引 0 开始 ->.iloc[:,3] 应该是第四列那么列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.