[英]error on preprocessing machine learning
我正在嘗試對訓練數據進行預處理,並且還嘗試了rehsape函數,但是沒有用,我得到了以下錯誤:
ValueError: Found input variables with inconsistent numbers of samples: [34, 12700]
這是我的代碼:
import pandas as pd
import numpy as np
from sklearn import preprocessing,neighbors
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
df=pd.read_csv('train.csv')
df.drop(['ID'],1,inplace=True)
X=np.array(df.drop(['label'],1))
y=np.array(df['label'])
print(X.shape)
X = preprocessing.StandardScaler().fit(X)
X=X.mean_
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
clf = RandomForestRegressor(n_estimators=1900,max_features='log2',max_depth=25)
clf.fit(X_train,y_train)
accuracy=clf.score(X_test,y_test)
print(accuracy)
問題在於X = preprocessing.StandardScaler().fit(X)
X=X.mean_
此后,您的X將僅包含每列的均值。
要轉換數據,請使用以下代碼:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
有關更多詳細信息,請參閱scikit-doc
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.