编码数据集数组时遇到问题

Question

Dataset: [1]: https://docs.google.com/spreadsheets/d/1jlKp7JR9Ewujv445QgT1kZpH5868fhXFFrA3ovWxS_0/edit?usp=sharing数据集：[1]： https://docs.google.com/spreadsheets/d/1jlKp7JR9Ewujv445QgT1kZpH5868fhXFFrA3ovWxS_0/edit?usp=sharing

I've been trying to deploy the ensemble method from sklearn to a small dataset I have linked above.我一直在尝试将集成方法从 sklearn 部署到我上面链接的一个小数据集。 For some reason I keep receiving this error.出于某种原因，我不断收到此错误。

ValueError: y should be a 1d array, got an array of shape (9, 56) instead.

This is the code:这是代码：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from numpy import array

from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder

cbdata = pd.read_excel("C:/Users/Andrew/cbupdated2.xlsx")

print(cbdata)
print(cbdata.describe())
df = cbdata.columns

print(df)

x = cbdata
y = cbdata.fundingstatus

xshape = x.shape
yshape = y.shape

shapes = xshape, yshape
print(shapes)

size = x.size, y.size
print(size)


###Problem ENCODING DATA
      
##Label encoder
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(x)
print(integer_encoded)


scaler = StandardScaler()
X_scaled = scaler.fit_transform(x)
print(X_scaled)

###Problm block
ec = OneHotEncoder()


X_encoded = cbdata.apply(lambda col: ec.fit_transform(col.astype(str)), axis=0, result_type='expand')
X_encoded2 = X_encoded.shape

print(X_encoded2)

Any help and/or suggestions on getting encoder to work, so I can use the ensemble method???关于让编码器工作的任何帮助和/或建议，所以我可以使用 ensemble 方法？？？

Thanks谢谢

Answer 1

LabelEncoder is meant for encoding target variables, not features. LabelEncoder用于编码目标变量，而不是特征。 See also this post另见这篇文章

You should use OrdinalEncoder on the categorical columns you want to transform, because I see some of your columns have floats and strings.您应该在要转换的分类列上使用OrdinalEncoder ，因为我看到您的某些列有浮点数和字符串。 So for example to transform company and industry :例如，改造company和industry ：

from sklearn.preprocessing import OrdinalEncoder

Cols = ["company","industry"]

integer_encoded = OrdinalEncoder().fit_transform(x[Cols])

编码数据集数组时遇到问题

问题描述

1 个解决方案

解决方案1
0 2021-12-20 01:23:35

编码数据集数组时遇到问题

问题描述

1 个解决方案

解决方案1 0 2021-12-20 01:23:35

解决方案1
0 2021-12-20 01:23:35