随机森林的 AUC - 不同的方法，不同的答案？

Question

我正在尝试找到一种方法，在不使用 MLeval 的情况下，为训练集和测试集提供随机森林 model 的 AUC。

这是ROC 在训练数据上的一个很好的例子，这是 ROC 在测试数据上的一个很好的例子。 训练数据的 AUC 的第一个示例给出 AUC=0.944。

 plot.roc(rfFit$pred$obs[selectedIndices], rfFit$pred$M[selectedIndices], print.auc=TRUE)

由于我不知道如何将第一个示例用于测试数据，因此我将 Sonar 数据应用于第二个示例并与第一个示例交叉验证答案

 ctrl <- trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T, savePredictions = T) rfFit <- train(Class ~., data=Sonar, method="rf", preProc=c("center", "scale"), trControl=ctrl, metric="ROC") print(rfFit)... mtry ROC Sens Spec 2 0.9459428 0.9280303 0.8044444 result.predicted.prob <- predict(rfFit, Sonar, type="prob") # Prediction result.roc <- roc(Sonar$Class, result.predicted.prob$M) plot(result.roc, print.thres="best", print.thres.best.method="closest.topleft", print.auc=TRUE)

但是整个训练数据（即声纳）的 AUC 为 1.0，而 rfFit 显示为 0.946，这也不同？ 那么为什么我会得到不同的结果以及计算训练和测试的 AUC 的正确方法是什么？

Answer 1

它是来自不同模型的 AUC。

您看到的第一个 AUC 是通过交叉验证从您的训练中得出的平均 AUC。 你可以在下面看到它：

head(rfFit$resample)
        ROC      Sens      Spec Resample
1 1.0000000 0.9090909 1.0000000   Fold02
2 0.9949495 1.0000000 0.7777778   Fold01
3 0.8045455 0.8181818 0.5000000   Fold03
4 1.0000000 1.0000000 0.8000000   Fold06
5 0.9595960 0.9090909 0.6666667   Fold05
6 0.9909091 0.9090909 0.9000000   Fold04

mean(rfFit$resample$ROC)
[1] 0.9540909

在这种情况下，它是 10 折交叉验证，您训练 90% 的数据并在 10% 上进行测试，因此每折的 model 略有不同，因此 AUC 也不同。

如果你对最终训练的 model 进行预测，你会得到 1 的 AUC，这不包括在插入符号 output 中。

所以，这取决于你的 AUC 应该反映什么。 如果它是 CV 训练期间的平均 AUC，则使用插入符号中的 ROC 值。 如果您只需要 1 个值来反映最终 model 的精度，那么您的第二种方法是可以的。

随机森林的 AUC - 不同的方法，不同的答案？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-07 08:19:46

随机森林的 AUC - 不同的方法，不同的答案？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-07 08:19:46

解决方案1
2 已采纳 2020-05-07 08:19:46