对二元分类准确度评估的质疑

Question

我正在使用 Keras 的sequential()模型进行二元分类。 我对它的准确性评估有些怀疑。

我正在为它计算 AUC-ROC。 为此，我应该使用预测概率还是预测类别？

解释：

训练model ，我正在执行model.predict()来查找训练和验证数据的预测值（下面的代码）。

y_pred_train = model.predict(x_train_df).ravel()
y_pred_val = model.predict(x_val_df).ravel()

fpr_train, tpr_train, thresholds_roc_train = roc_curve(y_train_df, y_pred_train, pos_label=None)
fpr_val, tpr_val, thresholds_roc_val = roc_curve(y_val_df, y_pred_val, pos_label=None)

roc_auc_train = auc(fpr_train, tpr_train)
roc_auc_val = auc(fpr_val, tpr_val)

plt.figure()
lw = 2
plt.plot(fpr_train, tpr_train, color='darkgreen',lw=lw, label='ROC curve Training (area = %0.2f)' % roc_auc_train)
plt.plot(fpr_val, tpr_val, color='darkorange',lw=lw, label='ROC curve Validation (area = %0.2f)' % roc_auc_val)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--',label='Base line')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.show()

这显示了情节作为本。 训练和验证准确率分别为 0.76 和 0.76。

model.predict() 给出的概率不是实际预测的类，所以我改变了上面代码示例的前两行，将类作为；

y_pred_train = (model.predict(x_train_df).ravel() > 0.5).astype("int32")
y_pred_val = (model.predict(x_test_df).ravel() > 0.5).astype("int32")

所以现在从类值计算 AUC-ROC（我猜）。 但是我得到的准确度非常不同而且很低。 训练和验证准确率分别为 0.66 和 0.46。 （情节）。

这两者之间的正确方法是什么，为什么精度不同？

Answer 1

ROC 通常是通过在将类阈值从 0. 变化到 1.0 时绘制敏感性 (TPR) 与特异性 (FPR) 来创建的，例如：参见此处的示例： https : //developers.google.com/machine-learning/crash- course/classification/roc-and-auc一些让你入门的伪代码：

pred_proba = model.predict(x_train_df).ravel()

for thresh in np.arange (0, 1, 0.1):
    pred = np.where(pred_proba >thresh ,1,0)

    # assuming you have a truth array of 0,1 classifications
    #now you can assess sensitivy by calculating true positive, false positive,...
    tp= np.count_nonzero(truth & pred)
    # same for false positive, false negative,...
    # they you can evaluate your sensitivity (TPR) and specificity(FPR) for the current threshold
    tpr = (tp / (tp + fn)
    # same for fpr
    # now you can plot the tpr, fpr point for the current treshold value

对二元分类准确度评估的质疑

问题描述

1 个解决方案

解决方案1
0 2021-11-16 10:59:24

对二元分类准确度评估的质疑

问题描述

1 个解决方案

解决方案1 0 2021-11-16 10:59:24

解决方案1
0 2021-11-16 10:59:24