为决策树的概率结果设置阈值

Question

I tried to calculate the confusion matrix after I conduct the decision tree model 进行决策树模型后，我尝试计算混淆矩阵

# tree model
tree <- rpart(LoanStatus_B ~.,data=train, method='class')
# confusion matrix
pdata <- predict(tree, newdata = test, type = "class")
confusionMatrix(data = pdata, reference = test$LoanStatus_B, positive = "1")

How can I set the threshold to my confusion matrix, say maybe I want probability above 0.2 as default, which is the binary outcome. 我如何为我的混淆矩阵设置阈值，比如说我希望概率默认值大于0.2，这是二进制结果。

Answer 1

Several things to note here. 这里要注意几件事。 Firstly, make sure you're getting class probabilities when you do your predictions. 首先，请确保在进行预测时获得课堂上的概率。 With prediction type ="class" you were just getting discrete classes, so what you wanted would've been impossible. 使用预测类型="class"您将获得离散类，因此您想要的将是不可能的。 So you'll want to make it "p" like mine below. 因此，您将希望使其像下面的"p" 。

library(rpart)
data(iris)

iris$Y <- ifelse(iris$Species=="setosa",1,0)

# tree model
tree <- rpart(Y ~Sepal.Width,data=iris, method='class')

# predictions
pdata <- as.data.frame(predict(tree, newdata = iris, type = "p"))
head(pdata)

# confusion matrix
table(iris$Y, pdata$`1` > .5)

Next note that .5 here is just an arbitrary value -- you can change it to whatever you want. 接下来请注意，.5只是一个任意值-您可以将其更改为所需的任何值。

I don't see a reason to use the confusionMatrix function, when a confusion matrix can be created simply this way and allows you to acheive your goal of easily changing the cutoff. 我看不出使用confusionMatrix函数的理由，因为可以通过这种方式简单地创建混淆矩阵，并使您实现轻松更改截止值的目标。

Having said that, if you do want to use the confusionMatrix function for your confusion matrix, then just create a discrete class prediction first based on your custom cutoff like this: 话虽如此，如果您确实想对混淆矩阵使用confusionMatrix函数，则只需根据您的自定义截止时间首先创建一个离散类预测，如下所示：

pdata$my_custom_predicted_class <- ifelse(pdata$`1` > .5, 1, 0)

Where, again, .5 is your custom chosen cutoff and can be anything you want it to be. 同样，.5是您自定义选择的截止值，并且可以是您想要的任何值。

caret::confusionMatrix(data = pdata$my_custom_predicted_class, 
                  reference = iris$Y, positive = "1")

 Confusion Matrix and Statistics Reference Prediction 0 1 0 94 19 1 6 31 Accuracy : 0.8333 95% CI : (0.7639, 0.8891) No Information Rate : 0.6667 P-Value [Acc > NIR] : 3.661e-06 Kappa : 0.5989 Mcnemar's Test P-Value : 0.0164 Sensitivity : 0.6200 Specificity : 0.9400 Pos Pred Value : 0.8378 Neg Pred Value : 0.8319 Prevalence : 0.3333 Detection Rate : 0.2067 Detection Prevalence : 0.2467 Balanced Accuracy : 0.7800 'Positive' Class : 1

为决策树的概率结果设置阈值

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-09-04 19:31:14

为决策树的概率结果设置阈值

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-09-04 19:31:14

解决方案1
0 已采纳 2017-09-04 19:31:14