从gridsearchcv中提取结果

Question

I just finished a gridsearch CV on tree based modelling and after looking into the results, I managed to access the results of each iteration from gridsearchCV. 我刚刚完成了基于树的建模的gridsearch CV，并仔细研究了结果，然后设法从gridsearchCV访问了每次迭代的结果。

I need each run into a separate row and each parameter in a separate column. 我需要每次运行到单独的行中，并且每个参数都在单独的列中。 I can run a loop or list comprehension to run for each row but unable to separate each run into columns 我可以对每个行运行循环或列表理解，但无法将每个运行分成几列

 df = grid.grid_scores_
 df[0]
 mean: 0.57114, std: 0.00907, params: {'criterion': 'gini', 'max_depth': 10, 
 'max_features': 8, 'min_samples_leaf': 2, 'min_samples_split': 2, 'splitter': 'best'}`

I tried with tuple and dict accesories but ended up in errors. 我尝试使用元组和字典附件，但最终出现错误。 Essentially I need every parameter in a new column like below. 本质上，我需要像下面这样的新列中的每个参数。

mean | std   | criterion | ..... | splitter
0.57   0.009    'gini'     ..... | 'best'
0.58   0.029    'entropy'     ..... | 'random'
.
.
.
.

Answer 1

You could use the pre-made class to generate a DataFrame with a report of the parameters (see stackoverflow post using this code here ) 您可以使用预制的类来生成带有参数报告的DataFrame （请参阅使用此代码的stackoverflow 此处）。

Imports and settings 导入和设置

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from gridsearchcv_helper import EstimatorSelectionHelper
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

Generate some data 产生一些数据

iris = load_iris()
X_iris = iris.data
y_iris = iris.target

Define model and hyper-parameter grid 定义模型和超参数网格

models = {'RandomForestClassifier': RandomForestClassifier()}
params = {'RandomForestClassifier': { 'n_estimators': [16, 32],
                                      'max_features': ['auto', 'sqrt', 'log2'],
                                      'criterion' : ['gini', 'entropy'] }}

Perform gridsearch (with CV) and report results 使用CV执行gridsearch并报告结果

helper = EstimatorSelectionHelper(models, params)
helper.fit(X_iris, y_iris, n_jobs=4)
df_gridsearchcv_summary = helper.score_summary()

Here is the output 这是输出

print(type(df_gridsearchcv_summary))
print(df_gridsearchcv_summary.iloc[:,1:])

RandomForestClassifier
<class 'pandas.core.frame.DataFrame'>
   min_score mean_score max_score  std_score criterion max_features n_estimators
0   0.941176   0.973856         1  0.0244553      gini         auto           16
1   0.921569    0.96732         1  0.0333269      gini         auto           32
8   0.921569    0.96732         1  0.0333269   entropy         sqrt           16
10  0.921569    0.96732         1  0.0333269   entropy         log2           16
2   0.941176   0.966912  0.980392  0.0182045      gini         sqrt           16
3   0.941176   0.966912  0.980392  0.0182045      gini         sqrt           32
4   0.941176   0.966912  0.980392  0.0182045      gini         log2           16
5   0.901961   0.960784         1  0.0423578      gini         log2           32
6   0.921569   0.960376  0.980392  0.0274454   entropy         auto           16
7   0.921569   0.960376  0.980392  0.0274454   entropy         auto           32
11  0.901961    0.95384  0.980392  0.0366875   entropy         log2           32
9   0.921569   0.953431  0.980392  0.0242635   entropy         sqrt           32

从gridsearchcv中提取结果

问题描述

1 个解决方案

解决方案1
0 2019-03-06 20:05:11

从gridsearchcv中提取结果

问题描述

1 个解决方案

解决方案1 0 2019-03-06 20:05:11

解决方案1
0 2019-03-06 20:05:11