繁体   English   中英

使用 sklearn 随机森林 model 出错

[英]Getting an error with random forest model using sklearn

我运行了以下代码来拟合随机森林 model。我使用了 Kaggle 数据集:

资料链接: https://www.kaggle.com/arnavr10880/winedataset-eda-ml/data?select=WineQT.csv

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold,cross_val_score,GridSearchCV
from sklearn import linear_model
from sklearn.ensemble import  RandomForestRegressor
import numpy as np


data= pd.read_csv("C:/Users/Downloads/Model Test Data.csv")

y=data.loc[: ,["y"]]
x=data.iloc[:,1:]

x_train, x_test,y_train, y_test = train_test_split(x,y)


rf=RandomForestRegressor()


params = {
    'n_estimators'      : [300,500],
    'max_depth'         : np.array([8,9,12]),
    'random_state'      : [0],
    
}

scoring = ["neg_mean_absolute_error","neg_mean_squared_error"]

for score in scoring:
    print("score %s" % scoring)
    clf= GridSearchCV(rf,param_grid=params,scoring="%s" %score,verbose=False)
    clf.fit(x_train,y_train)
    print("Best parameters:")
    print(clf.best_params_)
    means=clf.cv_results_["mean_test_score"]
    stds=clf.cv_results_["std_test_score"]

    for mean,sd,params in zip(means,stds, clf.cv_results_["params"]):
        print("%0.3f (+/-%0.3f) for %r" %(mean,2*sd,params) )

但是,我收到以下错误:

 Parameter grid for parameter (max_depth) needs to be a list or numpy array,
 but got (<class 'int'>). Single values need to be wrapped in a list with one element.

谁能帮我解决这个问题?

当您运行您的示例时,您会看到for循环中的第一个score打印得很好。 之后,检查params变量显示{'max_depth': 12, 'n_estimators': 500, 'random_state': 0}所以你不小心用特定的参数组合覆盖了params空间。

再次查看您的代码,它在循环末尾的打印中:

    for mean,sd,***params*** in zip(means,stds, clf.cv_results_["params"]):
        print("%0.3f (+/-%0.3f) for %r" %(mean,2*sd,params) )

所以在这里使用不同的变量。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM