具有分層交叉驗證的多個性能指標

Question

我有一個小的，不平衡的數據集，我想用不同的算法進行測試。 為了進行評估，我需要多個性能指標（准確性，准確性，召回率，fscore，支持）。

那就是我打算這樣做的方式，但我並不真正滿意，因為可能有一個更簡單的解決方案：

skf = StratifiedKFold(n_splits=3, random_state=42, shuffle=True)
accuracy = []
for train_index, test_index in skf.split(X,Y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = Y[train_index], Y[test_index]
    gradientBoost.fit(X_train, y_train)
    y_pred = gradientBoost.predict(X_test)

    accuracy.append(round(accuracy_score(y_test, y_pred), 2))
    precision, recall, fscore, support = np.round(score(y_test, y_pred), 2)

    print('precision: ' + str(precision))
    print('recall: ' + str(recall))
    print('fscore: ' + str(fscore))
    print('support: ' + str(support))

    print(classification_report(y_test, y_pred))

meanAcc= np.mean(np.asarray(accuracy))
print('meanAcc: ', meanAcc)

從理論上講，我可以像對准確性一樣對所有指標求平均值。 有沒有更簡單和/或更有效的方法？

編輯：

我嘗試繪制准確度，並將callback_weighted作為得分手。 不幸的是，圖中僅顯示了准確性。 在圖例中，提到了准確性和召回率。

#Initialize classifier
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 42,
                               max_depth=10, min_samples_leaf=8)

scoring = {'Accuracy' : make_scorer(accuracy_score), 'Recall' : 'recall_weighted'}

gs = GridSearchCV(DecisionTreeClassifier(criterion= 'entropy', random_state=42, min_samples_leaf = 10), param_grid={'max_depth' : range(2, 30, 2)},
                  scoring=scoring, cv=3, refit='Accuracy')

gs.fit(X_Distances, Y)
results = gs.cv_results_

plt.figure(figsize=(13, 13))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
          fontsize=16)

plt.xlabel("max_depth")
plt.ylabel("Score")
plt.grid()

ax = plt.axes()
ax.set_xlim(0, 32)
ax.set_ylim(0, 1)

# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_max_depth'].data, dtype=float)

for scorer, color in zip(sorted(scoring), ['g', 'k']):
    for sample, style in (('train', '--'), ('test', '-')):
        sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
        sample_score_std = results['std_%s_%s' % (sample, scorer)]
        ax.fill_between(X_axis, sample_score_mean - sample_score_std,
                        sample_score_mean + sample_score_std,
                        alpha=0.1 if sample == 'test' else 0, color=color)
        ax.plot(X_axis, sample_score_mean, style, color=color,
                alpha=1 if sample == 'test' else 0.7,
                label="%s (%s)" % (scorer, sample))

        best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
        best_score = results['mean_test_%s' % scorer][best_index]

        # Plot a dotted vertical line at the best score for that scorer marked by x
        ax.plot([X_axis[best_index], ] * 2, [0, best_score],
                linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)


    # Annotate the best score for that scorer
    ax.annotate("%0.2f" % best_score,
                (X_axis[best_index], best_score + 0.005))

plt.legend(loc="best")
plt.grid('off')
plt.show()

Answer 1

我們可以使用GridSearchCV進行多指標評估：

# Author: Raghav RV <rvraghav93@gmail.com>
# License: BSD
import numpy as np
from matplotlib import pyplot as plt

from sklearn.datasets import make_hastie_10_2
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

使用多個評估指標運行GridSearchCV¶

X, y = make_hastie_10_2(n_samples=8000, random_state=42)

# The scorers can be either be one of the predefined metric strings or a scorer
# callable, like the one returned by make_scorer
scoring = {'AUC': 'roc_auc', 'Accuracy': make_scorer(accuracy_score)}

# Setting refit='AUC', refits an estimator on the whole dataset with the
# parameter setting that has the best cross-validated AUC score.
# That estimator is made available at ``gs.best_estimator_`` along with
# parameters like ``gs.best_score_``, ``gs.best_parameters_`` and
# ``gs.best_index_``
gs = GridSearchCV(DecisionTreeClassifier(random_state=42),
                  param_grid={'min_samples_split': range(2, 403, 10)},
                  scoring=scoring, cv=5, refit='AUC')
gs.fit(X, y)
results = gs.cv_results_

繪制結果

plt.figure(figsize=(13, 13))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
          fontsize=16)

plt.xlabel("min_samples_split")
plt.ylabel("Score")
plt.grid()

ax = plt.axes()
ax.set_xlim(0, 402)
ax.set_ylim(0.73, 1)

# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_min_samples_split'].data, dtype=float)

for scorer, color in zip(sorted(scoring), ['g', 'k']):
    for sample, style in (('train', '--'), ('test', '-')):
        sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
        sample_score_std = results['std_%s_%s' % (sample, scorer)]
        ax.fill_between(X_axis, sample_score_mean - sample_score_std,
                        sample_score_mean + sample_score_std,
                        alpha=0.1 if sample == 'test' else 0, color=color)
        ax.plot(X_axis, sample_score_mean, style, color=color,
                alpha=1 if sample == 'test' else 0.7,
                label="%s (%s)" % (scorer, sample))

        best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
        best_score = results['mean_test_%s' % scorer][best_index]

        # Plot a dotted vertical line at the best score for that scorer marked by x
        ax.plot([X_axis[best_index], ] * 2, [0, best_score],
                linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)


    # Annotate the best score for that scorer
    ax.annotate("%0.2f" % best_score,
                (X_axis[best_index], best_score + 0.005))

plt.legend(loc="best")
plt.grid('off')
plt.show()

結果：

Answer 2

sklearn文檔建議使用以下指標之一來評估分類：

准確性指標.accuracy_score
average_precision指標.average_precision_score
f1metrics.f1_score用於二進制目標
f1_micrometrics.f1_score微觀平均
f1_macrometrics.f1_score宏平均
f1_加權指標。f1_score加權平均值
f1_samplesmetrics.f1_score（多標簽樣本）
neg_log_loss指標。log_loss需要predict_proba支持
精度指標.precision_score后綴與f1一樣適用
召回指標.recall_score后綴與f1一樣適用
roc_aucmetrics.roc_auc_score

讓我們嘗試accuracy和f1_weighted ：

from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.metrics import recall_score, make_scorer, accuracy_score
from sklearn.ensemble import RandomForestClassifier


X, y = make_classification(n_classes=10, n_informative=8, random_state=1)

scoring = {
  'Accuracy' : 'accuracy',
  'F1 (macro)' : 'f1_weighted',
}

scoring = {
  'Accuracy' : 'accuracy',
  'Recall' : 'f1_weighted',
}

gs = GridSearchCV(RandomForestClassifier(max_depth=5, random_state=42, min_samples_leaf = 10),
                  param_grid={'n_estimators' : range(2, 101, 2)}, return_train_score=True,
                  scoring=scoring, cv=3, refit='Accuracy')

gs.fit(X, y)
results = gs.cv_results_

##################
plt.figure(figsize=(12, 8))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
          fontsize=16)

plt.xlabel("n_estimators")
plt.ylabel("Score")
#plt.grid()

ax = plt.gca()
ax.set_xlim(0, 101)
ax.set_ylim(0, 1)

# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_n_estimators'].data, dtype=float)

for scorer, color in zip(sorted(scoring), ['g', 'k']):
    for sample, style in (('train', '--'), ('test', '-')):
        print('plotting: {} ({})'.format(scorer, sample))
        sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
        sample_score_std = results['std_%s_%s' % (sample, scorer)]
        ax.fill_between(X_axis, sample_score_mean - sample_score_std,
                        sample_score_mean + sample_score_std,
                        alpha=0.1 if sample == 'test' else 0, color=color)
        ax.plot(X_axis, sample_score_mean, style, color=color,
                alpha=1 if sample == 'test' else 0.7,
                label="%s (%s)" % (scorer, sample))

        best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
        best_score = results['mean_test_%s' % scorer][best_index]

        # Plot a dotted vertical line at the best score for that scorer marked by x
        ax.plot([X_axis[best_index], ] * 2, [0, best_score],
                linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)


    # Annotate the best score for that scorer
    ax.annotate("%0.2f" % best_score,
                (X_axis[best_index], best_score + 0.005))

plt.legend(loc="best")

plt.grid(False)
plt.show()

結果：

具有分層交叉驗證的多個性能指標

問題描述

2 個解決方案

解決方案1
1 已采納 2018-04-10 12:00:52

使用多個評估指標運行GridSearchCV¶

繪制結果

解決方案2
1 2018-04-10 16:36:19

具有分層交叉驗證的多個性能指標

問題描述

2 個解決方案

解決方案1 1 已采納 2018-04-10 12:00:52

使用多個評估指標運行GridSearchCV¶

繪制結果

解決方案2 1 2018-04-10 16:36:19

解決方案1
1 已采納 2018-04-10 12:00:52

解決方案2
1 2018-04-10 16:36:19