ValueError：形状不匹配：如果类别是一个数组，它必须是形状 (n_features,)

Question

I have create a simple code to implement OneHotEncoder .我创建了一个简单的代码来实现OneHotEncoder 。

from sklearn.preprocessing import OneHotEncoder
X = [[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']]
onehotencoder = OneHotEncoder(categories=[0])
X = onehotencoder.fit_transform(X).toarray()

I just want to use method called fit_transform to the X for index 0 , so it means for [0, 0, 1, 2] like what you see in X .我只想使用称为fit_transform方法到X的索引0 ，所以它意味着[0, 0, 1, 2]就像你在X看到的那样。 But it causes an error like this :但它会导致这样的错误：

ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).

Anyone can solve this problem ?任何人都可以解决这个问题？ I am stuck on it我被困在它上面

Answer 1

You need to use ColumnTransformer to specify the column index not categories parameter.您需要使用ColumnTransformer来指定列索引而不是categories参数。

Constructor parameter categories is to tell distinct category values explicitly.构造函数参数categories是明确地告诉不同的类别值。 Eg you could provide [0, 1, 2] explicitly, but auto will determine it.例如，您可以明确提供[0, 1, 2] ，但auto会确定它。 Further, you can use slice() object instead.此外，您可以改用slice()对象。

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

X = [[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']]

ct = ColumnTransformer(
    [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])],   # The column numbers to be transformed (here is [0] but can be [0, 1, 3])
    remainder='passthrough'                                         # Leave the rest of the columns untouched
)

X = ct.fit_transform(X)

Answer 2

pandas.get_dummies() method also can do same in the way below: pandas.get_dummies()方法也可以通过以下方式执行相同的操作：

import numpy as np
import pandas as pd
X = np.array([[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']])
X = np.array(pd.concat([pd.get_dummies(X[:, 0]), pd.DataFrame(X[:, 1])], axis = 1))

ValueError：形状不匹配：如果类别是一个数组，它必须是形状 (n_features,)

问题描述

2 个解决方案

解决方案1
14 已采纳 2019-12-30 05:44:25

解决方案2
1 2020-04-09 10:38:09

ValueError：形状不匹配：如果类别是一个数组，它必须是形状 (n_features,)

问题描述

2 个解决方案

解决方案1 14 已采纳 2019-12-30 05:44:25

解决方案2 1 2020-04-09 10:38:09

解决方案1
14 已采纳 2019-12-30 05:44:25

解决方案2
1 2020-04-09 10:38:09