使用 sklearn 随机森林 model 出错

Question

我运行了以下代码来拟合随机森林 model。我使用了 Kaggle 数据集：

资料链接： https://www.kaggle.com/arnavr10880/winedataset-eda-ml/data?select=WineQT.csv

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold,cross_val_score,GridSearchCV
from sklearn import linear_model
from sklearn.ensemble import  RandomForestRegressor
import numpy as np


data= pd.read_csv("C:/Users/Downloads/Model Test Data.csv")

y=data.loc[: ,["y"]]
x=data.iloc[:,1:]

x_train, x_test,y_train, y_test = train_test_split(x,y)


rf=RandomForestRegressor()


params = {
    'n_estimators'      : [300,500],
    'max_depth'         : np.array([8,9,12]),
    'random_state'      : [0],
    
}

scoring = ["neg_mean_absolute_error","neg_mean_squared_error"]

for score in scoring:
    print("score %s" % scoring)
    clf= GridSearchCV(rf,param_grid=params,scoring="%s" %score,verbose=False)
    clf.fit(x_train,y_train)
    print("Best parameters:")
    print(clf.best_params_)
    means=clf.cv_results_["mean_test_score"]
    stds=clf.cv_results_["std_test_score"]

    for mean,sd,params in zip(means,stds, clf.cv_results_["params"]):
        print("%0.3f (+/-%0.3f) for %r" %(mean,2*sd,params) )

但是，我收到以下错误：

 Parameter grid for parameter (max_depth) needs to be a list or numpy array,
 but got (<class 'int'>). Single values need to be wrapped in a list with one element.

谁能帮我解决这个问题？

Answer 1

当您运行您的示例时，您会看到for循环中的第一个score打印得很好。 之后，检查params变量显示{'max_depth': 12, 'n_estimators': 500, 'random_state': 0}所以你不小心用特定的参数组合覆盖了params空间。

再次查看您的代码，它在循环末尾的打印中：

    for mean,sd,***params*** in zip(means,stds, clf.cv_results_["params"]):
        print("%0.3f (+/-%0.3f) for %r" %(mean,2*sd,params) )

所以在这里使用不同的变量。

使用 sklearn 随机森林 model 出错

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-02-21 21:50:47

使用 sklearn 随机森林 model 出错

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-02-21 21:50:47

解决方案1
2 已采纳 2022-02-21 21:50:47