如何从以下结果中找到最佳 ML model？

Question

我试图通过在借贷俱乐部数据集上训练 model 来预测贷款违约者。 我发现很难从获得的结果中选择 model。 我怎样才能选择合适的？

以下结果来自不同的模型：

--------------------------------------------------
random forest
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.75      0.94      0.83      3401
         1.0       0.94      0.74      0.83      4125

    accuracy                           0.83      7526
   macro avg       0.84      0.84      0.83      7526
weighted avg       0.85      0.83      0.83      7526

Confusion Matrix:  
 [[3196  205]  
 [1081 3044]]
Training Accuracy:  0.9854712969525159  
Testing Accuracy:  0.8291256975817167  
Prediction with data having all values as 0:  Counter({0.0: 468, 1.0: 32})  
Prediction with data having all values as 1:  Counter({1.0: 365, 0.0: 135})

--------------------------------------------------
logistic regression
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.76      0.83      0.79      3401
         1.0       0.85      0.78      0.81      4125

    accuracy                           0.80      7526
   macro avg       0.80      0.81      0.80      7526
weighted avg       0.81      0.80      0.80      7526

Training Accuracy:  0.7995659107016301  
Testing Accuracy:  0.8037470103640713  
Confusion Matrix:  
 [[2828  573]  
 [ 904 3221]]  
Prediction with data having all values as 0:  Counter({0.0: 406, 1.0: 94})  
Prediction with data having all values as 1:  Counter({1.0: 379, 0.0: 121})
--------------------------------------------------
k nearest neighbor
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.73      0.94      0.82      3401
         1.0       0.93      0.72      0.81      4125

    accuracy                           0.82      7526
   macro avg       0.83      0.83      0.82      7526
weighted avg       0.84      0.82      0.82      7526

Training Accuracy:  0.8770818568391212
Testing Accuracy:  0.8161041722030294
Confusion Matrix:
 [[3188  213]
 [1171 2954]]
Prediction with data having all values as 0:  Counter({0.0: 460, 1.0: 40})  
Prediction with data having all values as 1:  Counter({1.0: 353, 0.0: 147})
--------------------------------------------------
cat boost
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.75      0.98      0.85      3401
         1.0       0.98      0.72      0.83      4125

    accuracy                           0.84      7526
   macro avg       0.86      0.85      0.84      7526
weighted avg       0.87      0.84      0.84      7526

Training Accuracy:  0.8628632175761871
Testing Accuracy:  0.8388254052617592
Confusion Matrix:
 [[3325   76]
 [1137 2988]]
Prediction with data having all values as 0:  Counter({0.0: 485, 1.0: 15})
Prediction with data having all values as 1:  Counter({1.0: 365, 0.0: 135})
--------------------------------------------------
xgboost
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.74      1.00      0.85      3401
         1.0       1.00      0.71      0.83      4125

    accuracy                           0.84      7526
   macro avg       0.87      0.86      0.84      7526
weighted avg       0.88      0.84      0.84      7526

Training Accuracy:  0.8437278525868178
Testing Accuracy:  0.8417486048365665
Confusion Matrix:
 [[3393    8]
 [1183 2942]]
Prediction with data having all values as 0:  Counter({0.0: 497, 1.0: 3})
Prediction with data having all values as 1:  Counter({1.0: 357, 0.0: 143})
--------------------------------------------------

Answer 1

您的最后一个 model 似乎是最好的：它在所有指标（准确度、精度、召回率和 f1 分数）上都给出了最高分。 唯一不重要的分数是对训练集的评估（我们正在对测试集进行评估）。

通常，您希望所有指标都具有最高值，但有时这是不可能的，您需要根据您要实现的目标了解所有指标对 select 和 model 的含义。 你经常需要找到一个权衡。 请注意，f1-score 基于精度和召回率，因此高 f1-score 意味着高精度和召回率。

如何从以下结果中找到最佳 ML model？

问题描述

1 个解决方案

解决方案1
0 2019-11-17 12:23:00

如何从以下结果中找到最佳 ML model？

问题描述

1 个解决方案

解决方案1 0 2019-11-17 12:23:00

解决方案1
0 2019-11-17 12:23:00