[英]How can I find the best ML model from the results below?
我試圖通過在借貸俱樂部數據集上訓練 model 來預測貸款違約者。 我發現很難從獲得的結果中選擇 model。 我怎樣才能選擇合適的?
以下結果來自不同的模型:
--------------------------------------------------
random forest
--------------------------------------------------
precision recall f1-score support
0.0 0.75 0.94 0.83 3401
1.0 0.94 0.74 0.83 4125
accuracy 0.83 7526
macro avg 0.84 0.84 0.83 7526
weighted avg 0.85 0.83 0.83 7526
Confusion Matrix:
[[3196 205]
[1081 3044]]
Training Accuracy: 0.9854712969525159
Testing Accuracy: 0.8291256975817167
Prediction with data having all values as 0: Counter({0.0: 468, 1.0: 32})
Prediction with data having all values as 1: Counter({1.0: 365, 0.0: 135})
--------------------------------------------------
logistic regression
--------------------------------------------------
precision recall f1-score support
0.0 0.76 0.83 0.79 3401
1.0 0.85 0.78 0.81 4125
accuracy 0.80 7526
macro avg 0.80 0.81 0.80 7526
weighted avg 0.81 0.80 0.80 7526
Training Accuracy: 0.7995659107016301
Testing Accuracy: 0.8037470103640713
Confusion Matrix:
[[2828 573]
[ 904 3221]]
Prediction with data having all values as 0: Counter({0.0: 406, 1.0: 94})
Prediction with data having all values as 1: Counter({1.0: 379, 0.0: 121})
--------------------------------------------------
k nearest neighbor
--------------------------------------------------
precision recall f1-score support
0.0 0.73 0.94 0.82 3401
1.0 0.93 0.72 0.81 4125
accuracy 0.82 7526
macro avg 0.83 0.83 0.82 7526
weighted avg 0.84 0.82 0.82 7526
Training Accuracy: 0.8770818568391212
Testing Accuracy: 0.8161041722030294
Confusion Matrix:
[[3188 213]
[1171 2954]]
Prediction with data having all values as 0: Counter({0.0: 460, 1.0: 40})
Prediction with data having all values as 1: Counter({1.0: 353, 0.0: 147})
--------------------------------------------------
cat boost
--------------------------------------------------
precision recall f1-score support
0.0 0.75 0.98 0.85 3401
1.0 0.98 0.72 0.83 4125
accuracy 0.84 7526
macro avg 0.86 0.85 0.84 7526
weighted avg 0.87 0.84 0.84 7526
Training Accuracy: 0.8628632175761871
Testing Accuracy: 0.8388254052617592
Confusion Matrix:
[[3325 76]
[1137 2988]]
Prediction with data having all values as 0: Counter({0.0: 485, 1.0: 15})
Prediction with data having all values as 1: Counter({1.0: 365, 0.0: 135})
--------------------------------------------------
xgboost
--------------------------------------------------
precision recall f1-score support
0.0 0.74 1.00 0.85 3401
1.0 1.00 0.71 0.83 4125
accuracy 0.84 7526
macro avg 0.87 0.86 0.84 7526
weighted avg 0.88 0.84 0.84 7526
Training Accuracy: 0.8437278525868178
Testing Accuracy: 0.8417486048365665
Confusion Matrix:
[[3393 8]
[1183 2942]]
Prediction with data having all values as 0: Counter({0.0: 497, 1.0: 3})
Prediction with data having all values as 1: Counter({1.0: 357, 0.0: 143})
--------------------------------------------------
您的最后一個 model 似乎是最好的:它在所有指標(准確度、精度、召回率和 f1 分數)上都給出了最高分。 唯一不重要的分數是對訓練集的評估(我們正在對測試集進行評估)。
通常,您希望所有指標都具有最高值,但有時這是不可能的,您需要根據您要實現的目標了解所有指標對 select 和 model 的含義。 你經常需要找到一個權衡。 請注意,f1-score 基於精度和召回率,因此高 f1-score 意味着高精度和召回率。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.