简体   繁体   English

在sklearn中进行超参数调整后,查找模型的准确性,精确度和召回率

[英]Finding accuracy, precision and recall of a model after hyperparameter tuning in sklearn

I've a binary classification problem, for which I've chosen 3 algorithms, Logistic Regression, SVM and Adaboost. 我有一个二元分类问题,我选择了3种算法,Logistic回归,SVM和Adaboost。 I'm using grid-search and k-fold cross validation on each of these to find the optimal set of hyper-parameters. 我正在对其中的每一个使用网格搜索和k折交叉验证来找到最佳的超参数集。 Now, based on the accuracy, precision and recall I need to choose the best model. 现在,基于准确性,精确度和召回率,我需要选择最佳型号。 But the problem is I'm not able to find any suitable way to retrieve these information. 但问题是我无法找到任何合适的方法来检索这些信息。 My code is given below: 我的代码如下:

from sklearn.model_selection import GridSearchCV
from sklearn.metrics.scorer import make_scorer
from sklearn import cross_validation

# TODO: Initialize the classifier
clfr_A = LogisticRegression(random_state=128)
clfr_B = SVC(random_state=128)
clfr_C = AdaBoostClassifier(random_state=128)

lr_param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000] }
svc_param_grid = {'C': [0.001, 0.01, 0.1, 1, 10], 'gamma' : [0.001, 0.01, 0.1, 1]}
adb_param_grid = {'n_estimators' : [50,100,150,200,250,500],'learning_rate' : [.5,.75,1.0,1.25,1.5,1.75,2.0]}

# TODO: Make an fbeta_score scoring object using make_scorer()
scorer = make_scorer(fbeta_score, beta = 0.5)

# TODO: Perform grid search on the classifier using 'scorer' as the scoring method using GridSearchCV()
clfrs = [clfr_A, clfr_B, clfr_C]
params = [lr_param_grid, svc_param_grid, adb_param_grid]

for clfr, param in zip(clfrs, params):
    grid_obj = GridSearchCV(clfr, param, cv=3, scoring=scorer, refit=True)
    grid_fit = grid_obj.fit(features_raw, target_raw)
    print grid_fit.best_estimator_
    print grid_fit.cv_results_

Problem is the cv_results_ gives out a lot of info but I'm not able to find anything relevant except mean_test_score . 问题是cv_results_给出了很多信息,但除了mean_test_score之外我找不到任何相关的mean_test_score Moreover I don't see any accuracy, precision or recall related metric there. 此外,我没有看到任何准确性,精确度或召回相关指标。

I can think of one way to achieve it. 我可以想到实现它的一种方法。 I can change the for loop to look something like the following: 我可以将for循环更改为如下所示:

score_params = ["accuracy", "precision", "recall"]
for clfr, param in zip(clfrs, params):
    grid_obj = GridSearchCV(clfr, param, cv=3, scoring=scorer, refit=True)
    grid_fit = grid_obj.fit(features_raw, target_raw)
    best_clf = grid_fit.best_estimator_
    for score in score_params:
        print score,
        print " : ",
        print cross_val_score(best_clf, features_raw, target_raw, scoring=score, cv=3).mean()

But is there any better way of doing it? 但有没有更好的方法呢? It seems I'm doing the operations multiple times for each model. 我似乎正在为每个模型多次执行操作。 Any help is appreciated. 任何帮助表示赞赏。

GridSearchCV is doing what you gave. GridSearchCV正在做你给的。 You gave the f_beta as scorer, so mean_test_score will return results of that f_beta for each parameter combination. 您将f_beta作为记分员,因此mean_test_score将返回每个参数组合的f_beta结果。 If you want to access other metrics, you need to tell the GridSearchCV explicitly to do so. 如果要访问其他指标,则需要明确告知GridSearchCV这样做。

GridSearchCV in newer versions of scikit-learn, supports multi-metric scoring. GridSearchCV在较新版本的scikit-learn中,支持多指标评分。 So you can pass multiple type of scorers in that. 所以你可以传递多种类型的得分手。 As per documentation : 根据文件

scoring : string, callable, list/tuple, dict or None, default: None 评分 :字符串,可调用,列表/元组,字典或无,默认值:无

... ... ......

For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values. 要评估多个度量标准,请提供(唯一)字符串列表或名称为键的密码和值为callables的dict。

See this example here: 在这里看到这个例子:

And change your scoring param as: 并将您的scoring参数更改为:

scoring = {'Accuracy': 'accuracy', 
           'FBeta': make_scorer(fbeta_score, beta = 0.5),
           # ... Add others here as you want.
           }

But now when you do it, you need to change the refit param also. 但是现在当你这样做时,你还需要改变refit参数。 Since different metrics here will give different type of scores for the parameter combinations, so you need to decide which one to select when refitting the estimator. 由于此处的不同指标将为参数组合提供不同类型的分数,因此您需要决定在重新设计估算器时选择哪一个。 So choose one of the keys from the scoring dict for refit 因此,请从评分字典中选择其中一个键进行refit

for clfr, param in zip(clfrs, params):
    grid_obj = GridSearchCV(clfr, param, cv=3, scoring=scorer, refit='FBeta')
    ...
    ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM