简体   繁体   English

为决策树的概率结果设置阈值

[英]set threshold for the probability result from decision tree

I tried to calculate the confusion matrix after I conduct the decision tree model 进行决策树模型后,我尝试计算混淆矩阵

# tree model
tree <- rpart(LoanStatus_B ~.,data=train, method='class')
# confusion matrix
pdata <- predict(tree, newdata = test, type = "class")
confusionMatrix(data = pdata, reference = test$LoanStatus_B, positive = "1")

How can I set the threshold to my confusion matrix, say maybe I want probability above 0.2 as default, which is the binary outcome. 我如何为我的混淆矩阵设置阈值,比如说我希望概率默认值大于0.2,这是二进制结果。

Several things to note here. 这里要注意几件事。 Firstly, make sure you're getting class probabilities when you do your predictions. 首先,请确保在进行预测时获得课堂上的概率。 With prediction type ="class" you were just getting discrete classes, so what you wanted would've been impossible. 使用预测类型="class"您将获得离散类,因此您想要的将是不可能的。 So you'll want to make it "p" like mine below. 因此,您将希望使其像下面的"p"

library(rpart)
data(iris)

iris$Y <- ifelse(iris$Species=="setosa",1,0)

# tree model
tree <- rpart(Y ~Sepal.Width,data=iris, method='class')

# predictions
pdata <- as.data.frame(predict(tree, newdata = iris, type = "p"))
head(pdata)

# confusion matrix
table(iris$Y, pdata$`1` > .5)

Next note that .5 here is just an arbitrary value -- you can change it to whatever you want. 接下来请注意,.5只是一个任意值-您可以将其更改为所需的任何值。

I don't see a reason to use the confusionMatrix function, when a confusion matrix can be created simply this way and allows you to acheive your goal of easily changing the cutoff. 我看不出使用confusionMatrix函数的理由,因为可以通过这种方式简单地创建混淆矩阵,并使您实现轻松更改截止值的目标。

Having said that, if you do want to use the confusionMatrix function for your confusion matrix, then just create a discrete class prediction first based on your custom cutoff like this: 话虽如此,如果您确实想对混淆矩阵使用confusionMatrix函数,则只需根据您的自定义截止时间首先创建一个离散类预测,如下所示:

pdata$my_custom_predicted_class <- ifelse(pdata$`1` > .5, 1, 0)

Where, again, .5 is your custom chosen cutoff and can be anything you want it to be. 同样,.5是您自定义选择的截止值,并且可以是您想要的任何值。

caret::confusionMatrix(data = pdata$my_custom_predicted_class, 
                  reference = iris$Y, positive = "1")
 Confusion Matrix and Statistics Reference Prediction 0 1 0 94 19 1 6 31 Accuracy : 0.8333 95% CI : (0.7639, 0.8891) No Information Rate : 0.6667 P-Value [Acc > NIR] : 3.661e-06 Kappa : 0.5989 Mcnemar's Test P-Value : 0.0164 Sensitivity : 0.6200 Specificity : 0.9400 Pos Pred Value : 0.8378 Neg Pred Value : 0.8319 Prevalence : 0.3333 Detection Rate : 0.2067 Detection Prevalence : 0.2467 Balanced Accuracy : 0.7800 'Positive' Class : 1 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM