[英]How can I find the best ML model from the results below?
我试图通过在借贷俱乐部数据集上训练 model 来预测贷款违约者。 我发现很难从获得的结果中选择 model。 我怎样才能选择合适的?
以下结果来自不同的模型:
--------------------------------------------------
random forest
--------------------------------------------------
precision recall f1-score support
0.0 0.75 0.94 0.83 3401
1.0 0.94 0.74 0.83 4125
accuracy 0.83 7526
macro avg 0.84 0.84 0.83 7526
weighted avg 0.85 0.83 0.83 7526
Confusion Matrix:
[[3196 205]
[1081 3044]]
Training Accuracy: 0.9854712969525159
Testing Accuracy: 0.8291256975817167
Prediction with data having all values as 0: Counter({0.0: 468, 1.0: 32})
Prediction with data having all values as 1: Counter({1.0: 365, 0.0: 135})
--------------------------------------------------
logistic regression
--------------------------------------------------
precision recall f1-score support
0.0 0.76 0.83 0.79 3401
1.0 0.85 0.78 0.81 4125
accuracy 0.80 7526
macro avg 0.80 0.81 0.80 7526
weighted avg 0.81 0.80 0.80 7526
Training Accuracy: 0.7995659107016301
Testing Accuracy: 0.8037470103640713
Confusion Matrix:
[[2828 573]
[ 904 3221]]
Prediction with data having all values as 0: Counter({0.0: 406, 1.0: 94})
Prediction with data having all values as 1: Counter({1.0: 379, 0.0: 121})
--------------------------------------------------
k nearest neighbor
--------------------------------------------------
precision recall f1-score support
0.0 0.73 0.94 0.82 3401
1.0 0.93 0.72 0.81 4125
accuracy 0.82 7526
macro avg 0.83 0.83 0.82 7526
weighted avg 0.84 0.82 0.82 7526
Training Accuracy: 0.8770818568391212
Testing Accuracy: 0.8161041722030294
Confusion Matrix:
[[3188 213]
[1171 2954]]
Prediction with data having all values as 0: Counter({0.0: 460, 1.0: 40})
Prediction with data having all values as 1: Counter({1.0: 353, 0.0: 147})
--------------------------------------------------
cat boost
--------------------------------------------------
precision recall f1-score support
0.0 0.75 0.98 0.85 3401
1.0 0.98 0.72 0.83 4125
accuracy 0.84 7526
macro avg 0.86 0.85 0.84 7526
weighted avg 0.87 0.84 0.84 7526
Training Accuracy: 0.8628632175761871
Testing Accuracy: 0.8388254052617592
Confusion Matrix:
[[3325 76]
[1137 2988]]
Prediction with data having all values as 0: Counter({0.0: 485, 1.0: 15})
Prediction with data having all values as 1: Counter({1.0: 365, 0.0: 135})
--------------------------------------------------
xgboost
--------------------------------------------------
precision recall f1-score support
0.0 0.74 1.00 0.85 3401
1.0 1.00 0.71 0.83 4125
accuracy 0.84 7526
macro avg 0.87 0.86 0.84 7526
weighted avg 0.88 0.84 0.84 7526
Training Accuracy: 0.8437278525868178
Testing Accuracy: 0.8417486048365665
Confusion Matrix:
[[3393 8]
[1183 2942]]
Prediction with data having all values as 0: Counter({0.0: 497, 1.0: 3})
Prediction with data having all values as 1: Counter({1.0: 357, 0.0: 143})
--------------------------------------------------
您的最后一个 model 似乎是最好的:它在所有指标(准确度、精度、召回率和 f1 分数)上都给出了最高分。 唯一不重要的分数是对训练集的评估(我们正在对测试集进行评估)。
通常,您希望所有指标都具有最高值,但有时这是不可能的,您需要根据您要实现的目标了解所有指标对 select 和 model 的含义。 你经常需要找到一个权衡。 请注意,f1-score 基于精度和召回率,因此高 f1-score 意味着高精度和召回率。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.