帶有列變換器和管道的 ML 模型的參數調整

Question

我的代碼可以完美運行，直到擬合最終模型。 但我不知道如何為管道做 GridSearchCV 或 RandomizedSearchCV。 請幫助我。

import pandas as pd
import numpy as np
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline


df = pd.read_csv('data/vehicle_dataset_v4A.csv')

X = df.drop('price', axis=1)
y = df['price']

numerical_ix = X.select_dtypes(include=['int64', 'float64']).columns
categorical_ix = X.select_dtypes(include=['object', 'bool']).columns

col_transform = make_column_transformer(
    (OneHotEncoder(), categorical_ix), 
    (StandardScaler(), numerical_ix),
    remainder='passthrough'
)

model = RandomForestRegressor()

pipe = make_pipeline(col_transform,model)

pipe.fit(X, y)

我嘗試了以下代碼。 代碼運行時沒有任何錯誤，但是當我嘗試使用 Gridsearchcv 進行預測時，它會在不同時間拋出不同的錯誤。 希望應該有一個解決方案。 否則，如果我能知道 gridsearch 之后最好的參數是什么，我可以直接將這些參數應用到模型中。

lr = {
    'base_score':[0.4,0.45,0.5,0.55,0.6],
    'max_depth':[1,2,3,4,6,8,10],
    'subsample':[0.5,0.6,0.7,0.8,0.9,1],
    'n_estimators': [50,100,200,250,300],
    'learning_rate':  [0.05,0.1,0.4,0.5,0.8,0.9,1],
    'min_child_weight': [0.1,0.5,1,1.5,2,3],
    'gamma': [0,0.1,0.5,1,1.5,2,2.5,3]
    }

clf = make_pipeline(OneHotEncoder(),
                    StandardScaler(with_mean=False),
                    GridSearchCV(RandomForestRegressor(),
                                 param_grid=lr,
                                 scoring='r2',cv=3,verbose=2))

Answer 1

關於你的申請的三個想法：

不要將OneHotEncoder用於RandomForestRegressor ，您不需要它。
不要使用make_pipeline ，這對您的問題來說make_pipeline過分了。
首先對數據應用StandardScaler ，然后運行GridSearchCV 。

請對此進行測試並向我們提供反饋。

帶有列變換器和管道的 ML 模型的參數調整

問題描述

1 個解決方案

解決方案1
-1 2020-11-09 08:35:26

帶有列變換器和管道的 ML 模型的參數調整

問題描述

1 個解決方案

解決方案1 -1 2020-11-09 08:35:26

解決方案1
-1 2020-11-09 08:35:26