简体   繁体   English

使用顺序() model 的二进制分类的准确度矩阵

[英]Accuracy matrices for binary classification with sequential() model

I have created a deep neural network model with sequential() model of keras.我创建了一个深度神经网络 model 和 keras 的sequential() model。 This is a binary classification problem.这是一个二元分类问题。 I have fitted the model with train data.我已经为model安装了火车数据。

I am confused about the calculation of different accuracy metrics for train and validation data.我对计算训练和验证数据的不同准确度指标感到困惑。 I am calculating RMSE, F1 Score, AUC of ROC and PR curve as,我正在计算 RMSE、F1 分数、ROC 和 PR 曲线的 AUC,

# Prediction
y_pred_train = model.predict(x_train_df).ravel()
y_pred_val = model.predict(x_val_df).ravel()

# RMSE
rmse_train = mean_squared_error(y_train_df, y_pred_train)
rmse_val = mean_squared_error(y_val_df, y_pred_val)

# ROC-AUC
fpr_train, tpr_train, thresholds_roc_train = roc_curve(y_train_df, y_pred_train, pos_label=None)
fpr_val, tpr_val, thresholds_roc_val = roc_curve(y_val_df, y_pred_val, pos_label=None)

roc_auc_train = auc(fpr_train, tpr_train)
roc_auc_val = auc(fpr_val, tpr_val)

# PR-AUC
precision_train, recall_train, thresholds_pr_train = precision_recall_curve(y_train_df, y_pred_train)
precision_val, recall_val, thresholds_pr_val = precision_recall_curve(y_val_df, y_pred_val)
pr_auc_train = auc(recall_train, precision_train)
pr_auc_val = auc(recall_val, precision_val)

# F1 Score
f1_train = np.mean(2 * (precision_train * recall_train) / (precision_train + recall_train))
f1_val = np.mean(2 * (precision_val * recall_val) / (precision_val + recall_val))

The values of these accuracies are,这些精度的值是,

  • RMSE Train 0.11 RMSE 训练0.11
  • RMSE Validation 0.13 RMSE 验证0.13
  • ROC-AUC Train 0.94 ROC-AUC 火车0.94
  • ROC-AUC Validation 0.91 ROC-AUC 验证0.91
  • PR-AUC Train 0.96 PR-AUC 训练0.96
  • PR-AUC Validation 0.93 PR-AUC 验证0.93
  • F1 Score Train 0.66 F1 分数火车0.66
  • F1 Score Validation 0.66 F1 分数验证0.66

I am very new to machine learning.我对机器学习很陌生。 I have implemented these codes by searching various web pages.我通过搜索各种 web 页面实现了这些代码。 Is my code correct?我的代码正确吗? I am getting this confusion as the F1 score is not very high although all other metrics have high values.我感到困惑,因为 F1 分数不是很高,尽管所有其他指标都有很高的值。

If the code is correct, then why I'm getting not so high F1 score?如果代码是正确的,那为什么我的 F1 分数没有那么高?

Edit 1编辑 1

As asked in the comment, the precision and recall values are正如评论中所问的,精度和召回值是

print(np.mean(precision_train))
print(np.mean(recall_train))
print(np.mean(precision_val))
print(np.mean(recall_val))

Output: Output:

0.9299899169174257
0.6012312742646909
0.8988925808831595
0.6052356704530617

Is my code correct?我的代码正确吗?

Sorry, not entirely -对不起,不完全——

  1. For the Precision, Recall, and f1 you should not be taking a mean of the curves such as f1_train = np.mean(.. , Instead:对于 Precision、Recall 和 f1,您不应取曲线的平均值,例如f1_train = np.mean(.. ,而是:
    Use your PR-AUC and ROC-AUC to define a threshold.使用您的 PR-AUC 和 ROC-AUC 定义阈值。 Use that threshold to binarise the y_pred_ *, and then call classification_report for printing final precision, recall and f1 scores.使用该阈值对y_pred_ * 进行二值化,然后调用classification_report以打印最终精度、召回率和 f1 分数。 You will, then, see the effective f1 score and how might precision and recall be impacting it.然后,您将看到有效的 f1 分数以及精确度和召回率如何影响它。

  2. RMSE : Go for a cross entropy metric may be (after you have binarised your predictions), because it is a classifier you trained. RMSE : Go 可能是交叉熵度量(在您对预测进行二值化之后),因为它是您训练的分类器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM