I am new at machine learning and trying to make a project to keep me busy, so I don't know a lot of how the sklearn
works. The main objective is to train a model to predict a categorical variable. When I tried labelEncoding
the y
variable of my model I get the following error:
ValueError: not enough values to unpack (expected 3, got 2)
FitFailedWarning)
Here is the code I am using
#Rough training
cols_to_use = [col for col in formatData.columns if col not in 'type1']
x = formatData[cols_to_use]
y = formatData.type1
#print(x.columns)
#print(y)
numerical_transformer = SimpleImputer(strategy='constant')
categorical_tansformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('label', LabelEncoder())
])
preprocessor = ColumnTransformer(transformers=[('num',numerical_transformer),('cat',categorical_tansformer)])
my_pipeline = Pipeline(steps=[('preprocessor',preprocessor),
('model',RandomForestRegressor(n_estimators=50,random_state=0))])
from sklearn.model_selection import cross_validate
from sklearn.model_selection import cross_val_predict
cv_results = cross_validate(my_pipeline,x,y,cv=5,scoring=('r2','neg_mean_absolute_error'))
predictions = cross_val_predict(my_pipeline,x,y,cv=5)
print(cv_results['test_neg_mean_absolute_error'])
print(predictions)
Any help is appreciated, if you need any more information, please comment.
Pipelines are designed to transform X
, not y
. (There's some discussion around this, especially eg in resamplers that should change rows of X
and y
together; see imblearn
for a fix in at least that direction.)
In particular, fit_transform(X, y)
has a default definition as fit(X, y).transform(X)
. So LabelEncoder
in a pipeline will try to transform X
, and will fail because it doesn't know what to do with 2D input. You should just label encode y
outside of the pipeline.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.