简体   繁体   English

对多类问题执行 K 折交叉验证,评分 = 'f1 or Recall or Precision'

[英]performing K-fold Cross Validation with scoring = 'f1 or Recall or Precision' for multi-class problem

I know this can easily be implemented for a binary classification problem.我知道这可以很容易地用于二进制分类问题。 But it seems to be a bit tough in the case of a multi-class problem.但是在多类问题的情况下似乎有点棘手。

I have a dataset that is un-balanced and is an example of a 4-class classification problem.我有一个不平衡的数据集,是 4 类分类问题的一个例子。 I have applied the RandomForestClassifier() on it to test various measures of the algorithm such as accuracy, precision, recall, f1_score, etc. Now I wanted to perform the K-fold Cross Validation on the training set with 10 splits and I want the 'scoring' parameter of the cross_val_score() function to be 'f1' instead of 'accuracy' .我已经在其上应用了RandomForestClassifier()来测试算法的各种度量,例如准确度、精确度、召回率、f1_score 等。现在我想对具有 10 个拆分的训练集执行 K 折交叉验证,我想要cross_val_score() function 的'scoring'参数为'f1'而不是'accuracy'

My code:我的代码:

# Random Forest
np.random.seed(123)
from sklearn.ensemble import RandomForestClassifier
classifier_RF = RandomForestClassifier(random_state = 0)
classifier_RF.fit(X_train, Y_train)

# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier_RF, X = X_train, y = Y_train, cv = 10, scoring = 'f1')
print("F1_Score: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

However, when I try to run this code, I am getting an error as follows:但是,当我尝试运行此代码时,我收到如下错误:

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

I have tried setting the average parameter to 'weighted' in the cross_val_function() as follows:我尝试在 cross_val_function() 中将平均参数设置为“加权”,如下所示:

accuracies = cross_val_score(estimator = classifier_RF, X = X_train, y = Y_train, cv = 10, scoring = 'f1', average = 'weighted')

but that's giving an error as follows:但这会产生如下错误:

TypeError: cross_val_score() got an unexpected keyword argument 'average'

The entire traceback is as follows:整个回溯如下:

Traceback (most recent call last):

  File "<ipython-input-1-ba4a5e1de09a>", line 97, in <module>
    accuracies = cross_val_score(estimator = classifier_RF, X = X_train, y = Y_train, cv = 10, scoring = 'f1')

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 406, in cross_val_score
    error_score=error_score)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 248, in cross_validate
    for train, test in cv.split(X, y, groups))

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 560, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 607, in _score
    scores = scorer(estimator, X_test, y_test)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 88, in __call__
    *args, **kwargs)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 213, in _score
    **self._kwargs)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 1047, in f1_score
    zero_division=zero_division)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 1175, in fbeta_score
    zero_division=zero_division)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 1434, in precision_recall_fscore_support
    pos_label)

  File "/Users/vivekchowdary/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 1265, in _check_set_wise_labels
    % (y_type, average_options))

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

You need to use make_score to define your metric and its parameters:您需要使用make_score来定义您的指标及其参数:

from sklearn.metrics import make_scorer, f1_score

scoring = {'f1_score' : make_scorer(f1_score, average='weighted')}

and then use this in your cross_val_score :然后在你的cross_val_score中使用它:

results = cross_val_score(estimator = classifier_RF, 
                          X = X_train, 
                          y = Y_train, 
                          cv = 10, 
                          scoring = scoring)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算 K 折交叉验证的不平衡数据集的精度、召回率和 f1 分数? - How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation? 任何 sklearn 模块都可以在 k 折交叉验证中返回负类的平均精度和召回分数吗? - Can any sklearn module return average precision and recall scores for negative class in k-fold cross validation? 如何在scikit-learn中使用k折交叉验证来获得每折的精确召回率? - How can I use k-fold cross-validation in scikit-learn to get precision-recall per fold? 用于多类对象检测的分层 K 折? - Stratified K-Fold For Multi-Class Object Detection? 如何使用交叉验证在多类数据集中对精度、召回率和 f1-score 进行评分? - how to score precision, recall and f1-score in a multi-class dataset using cross-validate? 选择 K-Fold 交叉验证值的问题 - problem in choosing K-Fold cross validation value K-fold 交叉验证以减少过度拟合:实现问题 - K-fold cross validation to reduce overfitting : problem with the implementation Python中的示例K折交叉验证 - Sample k-fold cross validation in Python 对整个数据集进行 K 折交叉验证 - K-Fold Cross Validation on entire Dataset 使用 RandomForest 进行 K 折交叉验证 - K-fold Cross Validation with RandomForest
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM