[英]Extracting results from gridsearchcv
I just finished a gridsearch CV on tree based modelling and after looking into the results, I managed to access the results of each iteration from gridsearchCV. 我刚刚完成了基于树的建模的gridsearch CV,并仔细研究了结果,然后设法从gridsearchCV访问了每次迭代的结果。
I need each run into a separate row and each parameter in a separate column. 我需要每次运行到单独的行中,并且每个参数都在单独的列中。 I can run a loop or list comprehension to run for each row but unable to separate each run into columns
我可以对每个行运行循环或列表理解,但无法将每个运行分成几列
df = grid.grid_scores_
df[0]
mean: 0.57114, std: 0.00907, params: {'criterion': 'gini', 'max_depth': 10,
'max_features': 8, 'min_samples_leaf': 2, 'min_samples_split': 2, 'splitter': 'best'}`
I tried with tuple and dict accesories but ended up in errors. 我尝试使用元组和字典附件,但最终出现错误。 Essentially I need every parameter in a new column like below.
本质上,我需要像下面这样的新列中的每个参数。
mean | std | criterion | ..... | splitter
0.57 0.009 'gini' ..... | 'best'
0.58 0.029 'entropy' ..... | 'random'
.
.
.
.
You could use the pre-made class to generate a DataFrame with a report of the parameters (see stackoverflow post using this code here ) 您可以使用预制的类来生成带有参数报告的DataFrame (请参阅使用此代码的stackoverflow 此处 )。
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from gridsearchcv_helper import EstimatorSelectionHelper
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
Generate some data 产生一些数据
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
Define model and hyper-parameter grid 定义模型和超参数网格
models = {'RandomForestClassifier': RandomForestClassifier()}
params = {'RandomForestClassifier': { 'n_estimators': [16, 32],
'max_features': ['auto', 'sqrt', 'log2'],
'criterion' : ['gini', 'entropy'] }}
Perform gridsearch (with CV) and report results 使用CV执行gridsearch并报告结果
helper = EstimatorSelectionHelper(models, params)
helper.fit(X_iris, y_iris, n_jobs=4)
df_gridsearchcv_summary = helper.score_summary()
Here is the output 这是输出
print(type(df_gridsearchcv_summary))
print(df_gridsearchcv_summary.iloc[:,1:])
RandomForestClassifier
<class 'pandas.core.frame.DataFrame'>
min_score mean_score max_score std_score criterion max_features n_estimators
0 0.941176 0.973856 1 0.0244553 gini auto 16
1 0.921569 0.96732 1 0.0333269 gini auto 32
8 0.921569 0.96732 1 0.0333269 entropy sqrt 16
10 0.921569 0.96732 1 0.0333269 entropy log2 16
2 0.941176 0.966912 0.980392 0.0182045 gini sqrt 16
3 0.941176 0.966912 0.980392 0.0182045 gini sqrt 32
4 0.941176 0.966912 0.980392 0.0182045 gini log2 16
5 0.901961 0.960784 1 0.0423578 gini log2 32
6 0.921569 0.960376 0.980392 0.0274454 entropy auto 16
7 0.921569 0.960376 0.980392 0.0274454 entropy auto 32
11 0.901961 0.95384 0.980392 0.0366875 entropy log2 32
9 0.921569 0.953431 0.980392 0.0242635 entropy sqrt 32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.