Scikit-learn 中的 GridSearchCV output 问题

Question

I'd like to perform a hyperparameter search for selecting preprocessing steps and models in sklearn as follows:我想执行超参数搜索以在 sklearn 中选择预处理步骤和模型，如下所示：

pipeline = Pipeline([("combiner", PolynomialFeatures()),
                     ("dimred", PCA()),
                     ("classifier", RandomForestClassifier())])

parameters = [{"combiner": [None]},
              {"combiner": [PolynomialFeatures()], "combiner__degree": [2], "combiner__interaction_only": [False, True]},

              {"dimred": [None]},
              {"dimred": [PCA()], "dimred__n_components": [.95, .75]},

              {"classifier": [RandomForestClassifier(n_estimators=100, class_weight="balanced")],
               "classifier__max_depth": [5, 10, None]},
              {"classifier": [KNeighborsClassifier(weights="distance")],
               "classifier__n_neighbors": [3, 7, 11]}]

CV = GridSearchCV(pipeline, parameters, cv=5, scoring="f1_weighted", refit=True, n_jobs=-1)
CV.fit(train_X, train_y)

Of course, I need the results with the best pipeline with the best parameters.当然，我需要具有最佳参数的最佳管道的结果。 However, when I request best estimators with CV.best_estimator_ I get only the winning components, not the hyperparameters:但是，当我使用CV.best_estimator_请求最佳估计器时，我只得到获胜的组件，而不是超参数：

Pipeline(steps=[('combiner', None), ('dimred', PCA()),
                ('classifier', RandomForestClassifier())])

When I print out the CV.best_params_ , I get an even shorter info (only with the first element of the Pipeline , the combiner , no info about dimred , classifier whatsoever):当我打印出CV.best_params_时，我会得到一个更短的信息（只有Pipeline的第一个元素， combiner ，没有关于dimred ， classifier器的信息）：

{'combiner': None}

How could I get the best pipeline combination with components and their hyperparameters?我怎样才能获得与组件及其超参数的最佳管道组合？

Answer 1

Pipeline objects have a get_params() method which returns the parameters of the pipeline. Pipeline对象有一个get_params()方法，它返回管道的参数。 This includes the parameters of the individual steps as well.这也包括各个步骤的参数。 Based on your example, the command根据您的示例，命令

CV.best_estimator_.get_params()

will retrieve all pipeline parameters of the best estimator, including those you are looking for.将检索最佳估计器的所有管道参数，包括您正在寻找的那些。

Answer 2

Since your param_grid is a list of dictionaries, each such dictionary gives a separate grid, and the search takes place over the disjoint union of those grids.由于您的param_grid是一个字典列表，因此每个这样的字典都提供一个单独的网格，并且搜索发生在这些网格的不相交并集上。 So the best_estimator_ and best_params_ in your case correspond to the single-point grid with combiner=None and everything else as defined in the original pipeline .因此，在您的情况下， best_estimator_和best_params_对应于带有combiner=None的单点网格以及原始pipeline中定义的所有其他内容。 (And the search has never explored combiner=None with other hyperparameters.) （并且搜索从未探索combiner=None与其他超参数。）

Scikit-learn 中的 GridSearchCV output 问题

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-02 08:45:13

解决方案2
0 2020-08-02 18:30:32

Scikit-learn 中的 GridSearchCV output 问题

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-02 08:45:13

解决方案2 0 2020-08-02 18:30:32

解决方案1
1 已采纳 2020-08-02 08:45:13

解决方案2
0 2020-08-02 18:30:32