简体   繁体   English

形状不匹配:如果类别是一个数组,它必须是形状 (n_features,)

[英]Shape mismatch: if categories is an array, it has to be of shape (n_features,)

Here is the code I'm trying to execute to encode the values of the first column of my data set using dummy values.这是我尝试执行的代码,以使用虚拟值对数据集第一列的值进行编码。

import numpy as py
import matplotlib.pyplot as plt
import pandas as pd
 

DataSet = pd.read_csv('Data.csv')
x=DataSet.iloc[:, :-1].values
y=DataSet.iloc[:,3].values

from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=py.nan,strategy='mean')
imputer=imputer.fit(x[:, 1:3])
x[:, 1:3]=imputer.transform(x[:, 1:3])


from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[0])
x=onehotencoder.fit_transform(x).toarray()

Here's the data I'm working on这是我正在处理的数据

France  44.0    72000.0
Spain   27.0    48000.0
Germany 30.0    54000.0
Spain   38.0    61000.0
Germany 40.0    63777.7
France  35.0    58000.0
Spain   38.777  52000.0
France  48.0    79000.0
Germany 50.0    83000.0
France  37.0    67000.0

I'm getting a error stating我收到一条错误消息

Shape mismatch: if categories is an array, it has to be of shape (n_features,). 

Can anyone help me fix this?谁能帮我解决这个问题?

Your second doesn't seem to be a categorical features, you should only one_hot_encode features which can take a finite number of discrete value.您的第二个似乎不是分类特征,您应该只使用 one_hot_encode 可以采用有限数量离散值的特征。 Like the first column which can only take a limited number of value ('spain', 'germany', 'france') If you only encode de the first column you can do:就像第一列只能取有限数量的值(“西班牙”、“德国”、“法国”)如果你只对第一列进行编码,你可以这样做:

from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[['France','Germany','Spain']])
x_1=onehotencoder.fit_transform(x[:,0].reshape(-1, 1)).toarray()
x = np.concatenate([x_1,x[:,1:]], axis=1)

and then your data will be in the form:然后您的数据将采用以下形式:

France Germany Spain score
1      0       0     44.0
0      0       1     27.0
...

Also, You only have 3 columns on your data but you're calling the fourth column with y=DataSet.iloc[:,3].values (first column start at index 0 ->.iloc[:,3] should give 4th column, then.此外,您的数据只有 3 列,但您使用 y=DataSet.iloc[:,3].values 调用第四列(第一列从索引 0 开始 ->.iloc[:,3] 应该是第四列那么列。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ValueError:形状不匹配:如果类别是一个数组,它必须是形状 (n_features,) - ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,) 为什么 sklearn.svm.SVC 的属性 coef_ 具有 shape = [n_class * (n_class-1) / 2, n_features]? - why sklearn.svm.SVC's attribute coef_ has shape = [n_class * (n_class-1) / 2, n_features]? (n_clusters,n_features)指的是什么形状? 以及如何使用 - What is the shape (n_clusters, n_features) referring to? and how to use it ValueError:输入已使用n_features = 4261训练模型时,输入的n_features = 10 - ValueError: Input has n_features=10 while the model has been trained with n_features=4261 形状与 OrdinalEncoder 中的手动类别不匹配 - Shape mismatch with manual categories in OrdinalEncoder 错误:“ColumnTransformer”对象没有属性“_n_features” - Error : 'ColumnTransformer' object has no attribute '_n_features' 形状不匹配:形状(2,)的值数组无法广播到形状(1,)的索引结果 - shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (1,) ValueError: Shape mismatch: if categories are an array, 即使将列指定为索引,错误也没有解决 - ValueError: Shape mismatch: if categories is an array, The error is not resolved even after specifying the columns as indexes ValueError:model 的特征数量必须与输入匹配。 Model n_features 为 3,输入 n_features 为 2 - ValueError: Number of features of the model must match the input. Model n_features is 3 and input n_features is 2 model 的特征数量必须与输入相匹配。 Model n_features 为 7985,输入 n_features 为 1 - The number of features of the model must match the input. Model n_features is 7985 and input n_features is 1
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM