简体   繁体   English

编码数据集数组时遇到问题

[英]Having trouble encoding dataset array

Dataset: [1]: https://docs.google.com/spreadsheets/d/1jlKp7JR9Ewujv445QgT1kZpH5868fhXFFrA3ovWxS_0/edit?usp=sharing数据集:[1]: https://docs.google.com/spreadsheets/d/1jlKp7JR9Ewujv445QgT1kZpH5868fhXFFrA3ovWxS_0/edit?usp=sharing

I've been trying to deploy the ensemble method from sklearn to a small dataset I have linked above.我一直在尝试将集成方法从 sklearn 部署到我上面链接的一个小数据集。 For some reason I keep receiving this error.出于某种原因,我不断收到此错误。

ValueError: y should be a 1d array, got an array of shape (9, 56) instead.

This is the code:这是代码:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from numpy import array

from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder

cbdata = pd.read_excel("C:/Users/Andrew/cbupdated2.xlsx")

print(cbdata)
print(cbdata.describe())
df = cbdata.columns

print(df)

x = cbdata
y = cbdata.fundingstatus

xshape = x.shape
yshape = y.shape

shapes = xshape, yshape
print(shapes)

size = x.size, y.size
print(size)


###Problem ENCODING DATA
      
##Label encoder
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(x)
print(integer_encoded)


scaler = StandardScaler()
X_scaled = scaler.fit_transform(x)
print(X_scaled)

###Problm block
ec = OneHotEncoder()


X_encoded = cbdata.apply(lambda col: ec.fit_transform(col.astype(str)), axis=0, result_type='expand')
X_encoded2 = X_encoded.shape

print(X_encoded2)

Any help and/or suggestions on getting encoder to work, so I can use the ensemble method???关于让编码器工作的任何帮助和/或建议,所以我可以使用 ensemble 方法???

Thanks谢谢

LabelEncoder is meant for encoding target variables, not features. LabelEncoder用于编码目标变量,而不是特征。 See also this post另见这篇文章

You should use OrdinalEncoder on the categorical columns you want to transform, because I see some of your columns have floats and strings.您应该在要转换的分类列上使用OrdinalEncoder ,因为我看到您的某些列有浮点数和字符串。 So for example to transform company and industry :例如,改造companyindustry

from sklearn.preprocessing import OrdinalEncoder

Cols = ["company","industry"]

integer_encoded = OrdinalEncoder().fit_transform(x[Cols])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM