简体   繁体   English

在R中找到混淆矩阵的阈值

[英]Find a threshold value for confusion matrix in R

I was doing a logistic regression and made a table that represents the predicted probability ,actual class, and predicted class.我正在做一个逻辑回归,并制作了一个代表预测概率、实际类别和预测类别的表格。 If the predicted probability is more than 0.5, I classified it as 1 ,so the predicted class becomes 1 .如果预测的概率大于0.5,我把它归为1 ,所以预测的类就变成了1 But I want to change the threshold value from 0.5 to another value.但我想将阈值从 0.5 更改为另一个值。

I was considering to find a threshold value that maximizes both true positive rate and true negative rate.我正在考虑找到一个使真阳性率和真阴性率都最大化的阈值。 Here I made a simple data df to demonstrate what I want to do.这里我做了一个简单的数据df来演示我想要做什么。

df<-data.frame(actual_class=c(0,1,0,0,1,1,1,0,0,1),
               predicted_probability=c(0.51,0.3,0.2,0.35,0.78,0.69,0.81,0.31,0.59,0.12),
               predicted_class=c(1,0,0,0,1,1,1,0,1,0))

If I can find a threshold value, I will classify using that value instead of 0.5.如果我能找到一个阈值,我将使用该值而不是 0.5 进行分类。 I don't know how to find a threshold value that both maximizes true positive rate and true negative rate.我不知道如何找到同时最大化真阳性率和真阴性率的阈值。

You can check a range of values pretty easily:您可以很容易地检查一系列值:

probs <- seq(0, 1, by=.05)
names(probs) <- probs
results <- sapply(probs, function(x) df$actual_class == as.integer(df$predicted_probability > x))

results is a 10 row by 21 column logical matrix showing when the predicted class equals the actual class: results是一个 10 行 x 21 列的逻辑矩阵,显示预测类何时等于实际类:

colSums(results)   # Number of correct predictions
   0 0.05  0.1 0.15  0.2 0.25  0.3 0.35  0.4 0.45  0.5 0.55  0.6 0.65  0.7 0.75  0.8 0.85  0.9 0.95    1 
   5    5    5    4    5    5    4    6    6    6    6    7    8    8    7    7    6    5    5    5    5 
predict <- as.integer(df$predicted_probability > .6)
xtabs(~df$actual_class+predict)
#                predict
# df$actual_class 0 1
#               0 5 0
#               1 2 3

You can see that probabilities of .6 and .65 result in 8 correct predictions.您可以看到 0.6 和 0.65 的概率导致 8 个正确的预测。 This conclusion is based on the data you used in the analysis so it probably overestimates how successful you would be with new data.该结论基于您在分析中使用的数据,因此它可能高估了您使用新数据的成功程度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM