简体   繁体   English

无法使用 sklearn 库中的 fit_transform 估算一维数组(拆分测试)

[英]Cannot impute 1D array with fit_transform from sklearn library (split-test)

I'm trying to impute 1D array with shape (14599,) with simple imputer with most_frequent strategy but it said it expected 2D array, i already tried reshaping it (-1,1) and (1,-1) but its error ValueError: could not broadcast input array from shape (14599,1) into shape (14599) how can i impute this since reshaping wont solve the problem?我正在尝试使用具有 most_frequent 策略的简单插补器来估算形状为 (14599,) 的一维数组,但它说它需要二维数组,我已经尝试对其进行整形 (-1,1) 和 (1,-1) 但它的错误 ValueError : 无法将输入数组从形状 (14599,1) 广播到形状 (14599),因为重塑无法解决问题,我该如何归咎于这一点? i dont understand why it throws error.我不明白为什么它会引发错误。 I already tried to ask it in DS stackexchange and someone answered maybe it's the pandas series but i made the x,y in numpy array then pass it into the parameter for X,y/train,test so i'm not sure我已经尝试在DS stackexchange 中询问它,有人回答可能是 Pandas 系列,但我在 numpy 数组中创建了 x,y,然后将其传递给 X,y/train,test 的参数,所以我不确定

##libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

##codes
plt.close('all')
avo_sales = pd.read_csv('avocados.csv')
avo_sales.rename(columns = {'4046':'small PLU sold',
                            '4225':'large PLU sold',
                            '4770':'xlarge PLU sold'},
                 inplace= True)

avo_sales.columns = avo_sales.columns.str.replace(' ','')

plt.scatter(avo_sales.Date,avo_sales.TotalBags)

x = np.array(avo_sales.drop(['TotalBags','Unnamed:0','year','region','Date'],1))
y = np.array(avo_sales.TotalBags)

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

impC = SimpleImputer(strategy='most_frequent')
X_train[:,8] = impC.fit_transform(X_train[:,8].reshape(-1,1)) <-- error here

imp = SimpleImputer(strategy='median')
X_train[:,1:8] = imp.fit_transform(X_train[:,1:8])

le = LabelEncoder()
X_train[:,8] = le.fit_transform(X_train[:,8])

Change the line:更改行:

X_train[:,8] = impC.fit_transform(X_train[:,8].reshape(-1,1))

to

X_train[:,8] = impC.fit_transform(X_train[:,8].reshape(-1,1)).ravel()

and your error will disappear.你的错误就会消失。

It's assigning imputed values back what causes issues on your code.它将估算值分配回导致代码出现问题的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 sklearn.impute SimpleImputer:为什么transform()首先需要fit_transform()? - sklearn.impute SimpleImputer: why does transform() need fit_transform() first? Sklearn train_test_split 创建一维数组 - Sklearn train_test_split creates 1d array 矢量化fit_transform如何在sklearn中工作? - How vectorizer fit_transform work in sklearn? 不同的 output 同时使用 fit_transform vs fit and transform from sklearn - Different output while using fit_transform vs fit and transform from sklearn sklearn countvectorizer 中的 fit_transform 和 transform 有什么区别? - What is the difference between fit_transform and transform in sklearn countvectorizer? sklearn中的'transform'和'fit_transform'有什么区别 - what is the difference between 'transform' and 'fit_transform' in sklearn 如何将 sklearn fit_transform 与 pandas 一起使用并返回 dataframe 而不是 numpy 数组? - How to use sklearn fit_transform with pandas and return dataframe instead of numpy array? “ValueError: y 应该是一个一维数组,而是得到了一个形状为 (3, 4) 的数组。” 使用 sklearn 中的 fit() 时 - “ValueError: y should be a 1d array, got an array of shape (3, 4) instead.” While using fit() from sklearn 为什么 fit_transform() 不适用于测试集? - why is fit_transform() not applied to test sets? fit_transform之后的数组大小不同 - Different size of array after fit_transform
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM