如何從以下結果中找到最佳 ML model？

Question

我試圖通過在借貸俱樂部數據集上訓練 model 來預測貸款違約者。 我發現很難從獲得的結果中選擇 model。 我怎樣才能選擇合適的？

以下結果來自不同的模型：

--------------------------------------------------
random forest
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.75      0.94      0.83      3401
         1.0       0.94      0.74      0.83      4125

    accuracy                           0.83      7526
   macro avg       0.84      0.84      0.83      7526
weighted avg       0.85      0.83      0.83      7526

Confusion Matrix:  
 [[3196  205]  
 [1081 3044]]
Training Accuracy:  0.9854712969525159  
Testing Accuracy:  0.8291256975817167  
Prediction with data having all values as 0:  Counter({0.0: 468, 1.0: 32})  
Prediction with data having all values as 1:  Counter({1.0: 365, 0.0: 135})

--------------------------------------------------
logistic regression
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.76      0.83      0.79      3401
         1.0       0.85      0.78      0.81      4125

    accuracy                           0.80      7526
   macro avg       0.80      0.81      0.80      7526
weighted avg       0.81      0.80      0.80      7526

Training Accuracy:  0.7995659107016301  
Testing Accuracy:  0.8037470103640713  
Confusion Matrix:  
 [[2828  573]  
 [ 904 3221]]  
Prediction with data having all values as 0:  Counter({0.0: 406, 1.0: 94})  
Prediction with data having all values as 1:  Counter({1.0: 379, 0.0: 121})
--------------------------------------------------
k nearest neighbor
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.73      0.94      0.82      3401
         1.0       0.93      0.72      0.81      4125

    accuracy                           0.82      7526
   macro avg       0.83      0.83      0.82      7526
weighted avg       0.84      0.82      0.82      7526

Training Accuracy:  0.8770818568391212
Testing Accuracy:  0.8161041722030294
Confusion Matrix:
 [[3188  213]
 [1171 2954]]
Prediction with data having all values as 0:  Counter({0.0: 460, 1.0: 40})  
Prediction with data having all values as 1:  Counter({1.0: 353, 0.0: 147})
--------------------------------------------------
cat boost
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.75      0.98      0.85      3401
         1.0       0.98      0.72      0.83      4125

    accuracy                           0.84      7526
   macro avg       0.86      0.85      0.84      7526
weighted avg       0.87      0.84      0.84      7526

Training Accuracy:  0.8628632175761871
Testing Accuracy:  0.8388254052617592
Confusion Matrix:
 [[3325   76]
 [1137 2988]]
Prediction with data having all values as 0:  Counter({0.0: 485, 1.0: 15})
Prediction with data having all values as 1:  Counter({1.0: 365, 0.0: 135})
--------------------------------------------------
xgboost
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.74      1.00      0.85      3401
         1.0       1.00      0.71      0.83      4125

    accuracy                           0.84      7526
   macro avg       0.87      0.86      0.84      7526
weighted avg       0.88      0.84      0.84      7526

Training Accuracy:  0.8437278525868178
Testing Accuracy:  0.8417486048365665
Confusion Matrix:
 [[3393    8]
 [1183 2942]]
Prediction with data having all values as 0:  Counter({0.0: 497, 1.0: 3})
Prediction with data having all values as 1:  Counter({1.0: 357, 0.0: 143})
--------------------------------------------------

Answer 1

您的最后一個 model 似乎是最好的：它在所有指標（准確度、精度、召回率和 f1 分數）上都給出了最高分。 唯一不重要的分數是對訓練集的評估（我們正在對測試集進行評估）。

通常，您希望所有指標都具有最高值，但有時這是不可能的，您需要根據您要實現的目標了解所有指標對 select 和 model 的含義。 你經常需要找到一個權衡。 請注意，f1-score 基於精度和召回率，因此高 f1-score 意味着高精度和召回率。

如何從以下結果中找到最佳 ML model？

問題描述

1 個解決方案

解決方案1
0 2019-11-17 12:23:00

如何從以下結果中找到最佳 ML model？

問題描述

1 個解決方案

解決方案1 0 2019-11-17 12:23:00

解決方案1
0 2019-11-17 12:23:00