[英]GridSearchCV output problems in Scikit-learn
I'd like to perform a hyperparameter search for selecting preprocessing steps and models in sklearn as follows:我想执行超参数搜索以在 sklearn 中选择预处理步骤和模型,如下所示:
pipeline = Pipeline([("combiner", PolynomialFeatures()),
("dimred", PCA()),
("classifier", RandomForestClassifier())])
parameters = [{"combiner": [None]},
{"combiner": [PolynomialFeatures()], "combiner__degree": [2], "combiner__interaction_only": [False, True]},
{"dimred": [None]},
{"dimred": [PCA()], "dimred__n_components": [.95, .75]},
{"classifier": [RandomForestClassifier(n_estimators=100, class_weight="balanced")],
"classifier__max_depth": [5, 10, None]},
{"classifier": [KNeighborsClassifier(weights="distance")],
"classifier__n_neighbors": [3, 7, 11]}]
CV = GridSearchCV(pipeline, parameters, cv=5, scoring="f1_weighted", refit=True, n_jobs=-1)
CV.fit(train_X, train_y)
Of course, I need the results with the best pipeline with the best parameters.当然,我需要具有最佳参数的最佳管道的结果。 However, when I request best estimators with
CV.best_estimator_
I get only the winning components, not the hyperparameters:但是,当我使用
CV.best_estimator_
请求最佳估计器时,我只得到获胜的组件,而不是超参数:
Pipeline(steps=[('combiner', None), ('dimred', PCA()),
('classifier', RandomForestClassifier())])
When I print out the CV.best_params_
, I get an even shorter info (only with the first element of the Pipeline
, the combiner
, no info about dimred
, classifier
whatsoever):当我打印出
CV.best_params_
时,我会得到一个更短的信息(只有Pipeline
的第一个元素, combiner
,没有关于dimred
, classifier
器的信息):
{'combiner': None}
How could I get the best pipeline combination with components and their hyperparameters?我怎样才能获得与组件及其超参数的最佳管道组合?
Pipeline
objects have a get_params()
method which returns the parameters of the pipeline. Pipeline
对象有一个get_params()
方法,它返回管道的参数。 This includes the parameters of the individual steps as well.这也包括各个步骤的参数。 Based on your example, the command
根据您的示例,命令
CV.best_estimator_.get_params()
will retrieve all pipeline parameters of the best estimator, including those you are looking for.将检索最佳估计器的所有管道参数,包括您正在寻找的那些。
Since your param_grid
is a list of dictionaries, each such dictionary gives a separate grid, and the search takes place over the disjoint union of those grids.由于您的
param_grid
是一个字典列表,因此每个这样的字典都提供一个单独的网格,并且搜索发生在这些网格的不相交并集上。 So the best_estimator_
and best_params_
in your case correspond to the single-point grid with combiner=None
and everything else as defined in the original pipeline
.因此,在您的情况下,
best_estimator_
和best_params_
对应于带有combiner=None
的单点网格以及原始pipeline
中定义的所有其他内容。 (And the search has never explored combiner=None
with other hyperparameters.) (并且搜索从未探索
combiner=None
与其他超参数。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.