形状不匹配：如果类别是一个数组，它必须是形状 (n_features,)

Question

Here is the code I'm trying to execute to encode the values of the first column of my data set using dummy values.这是我尝试执行的代码，以使用虚拟值对数据集第一列的值进行编码。

import numpy as py
import matplotlib.pyplot as plt
import pandas as pd
 

DataSet = pd.read_csv('Data.csv')
x=DataSet.iloc[:, :-1].values
y=DataSet.iloc[:,3].values

from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=py.nan,strategy='mean')
imputer=imputer.fit(x[:, 1:3])
x[:, 1:3]=imputer.transform(x[:, 1:3])


from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[0])
x=onehotencoder.fit_transform(x).toarray()

Here's the data I'm working on这是我正在处理的数据

France  44.0    72000.0
Spain   27.0    48000.0
Germany 30.0    54000.0
Spain   38.0    61000.0
Germany 40.0    63777.7
France  35.0    58000.0
Spain   38.777  52000.0
France  48.0    79000.0
Germany 50.0    83000.0
France  37.0    67000.0

I'm getting a error stating我收到一条错误消息

Shape mismatch: if categories is an array, it has to be of shape (n_features,).

Can anyone help me fix this?谁能帮我解决这个问题？

Answer 1

Your second doesn't seem to be a categorical features, you should only one_hot_encode features which can take a finite number of discrete value.您的第二个似乎不是分类特征，您应该只使用 one_hot_encode 可以采用有限数量离散值的特征。 Like the first column which can only take a limited number of value ('spain', 'germany', 'france') If you only encode de the first column you can do:就像第一列只能取有限数量的值（“西班牙”、“德国”、“法国”）如果你只对第一列进行编码，你可以这样做：

from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[['France','Germany','Spain']])
x_1=onehotencoder.fit_transform(x[:,0].reshape(-1, 1)).toarray()
x = np.concatenate([x_1,x[:,1:]], axis=1)

and then your data will be in the form:然后您的数据将采用以下形式：

France Germany Spain score
1      0       0     44.0
0      0       1     27.0
...

Also, You only have 3 columns on your data but you're calling the fourth column with y=DataSet.iloc[:,3].values (first column start at index 0 ->.iloc[:,3] should give 4th column, then.此外，您的数据只有 3 列，但您使用 y=DataSet.iloc[:,3].values 调用第四列（第一列从索引 0 开始 ->.iloc[:,3] 应该是第四列那么列。

形状不匹配：如果类别是一个数组，它必须是形状 (n_features,)

问题描述

1 个解决方案

解决方案1
0 2020-06-29 09:20:22

形状不匹配：如果类别是一个数组，它必须是形状 (n_features,)

问题描述

1 个解决方案

解决方案1 0 2020-06-29 09:20:22

解决方案1
0 2020-06-29 09:20:22