[英]ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,)
I have create a simple code to implement OneHotEncoder
.我创建了一个简单的代码来实现OneHotEncoder
。
from sklearn.preprocessing import OneHotEncoder
X = [[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']]
onehotencoder = OneHotEncoder(categories=[0])
X = onehotencoder.fit_transform(X).toarray()
I just want to use method called fit_transform
to the X
for index 0
, so it means for [0, 0, 1, 2]
like what you see in X
.我只想使用称为fit_transform
方法到X
的索引0
,所以它意味着[0, 0, 1, 2]
就像你在X
看到的那样。 But it causes an error like this :但它会导致这样的错误:
ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).
Anyone can solve this problem ?任何人都可以解决这个问题? I am stuck on it我被困在它上面
You need to use ColumnTransformer
to specify the column index not categories
parameter.您需要使用ColumnTransformer
来指定列索引而不是categories
参数。
Constructor parameter categories
is to tell distinct category values explicitly.构造函数参数categories
是明确地告诉不同的类别值。 Eg you could provide [0, 1, 2]
explicitly, but auto
will determine it.例如,您可以明确提供[0, 1, 2]
,但auto
会确定它。 Further, you can use slice()
object instead.此外,您可以改用slice()
对象。
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
X = [[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']]
ct = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(categories='auto'), [0])], # The column numbers to be transformed (here is [0] but can be [0, 1, 3])
remainder='passthrough' # Leave the rest of the columns untouched
)
X = ct.fit_transform(X)
pandas.get_dummies()
method also can do same in the way below: pandas.get_dummies()
方法也可以通过以下方式执行相同的操作:
import numpy as np
import pandas as pd
X = np.array([[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']])
X = np.array(pd.concat([pd.get_dummies(X[:, 0]), pd.DataFrame(X[:, 1])], axis = 1))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.