[英]Optimize threshold to always be a particular value of the sensitivity/true positive rate
How can I code in r for the threshold of a predictive model to automatically be a value such that the sensitivity is a particular proportion/value for all runs of the model? 如何在r中编码以使预测模型的阈值自动成为一个值,以使灵敏度对于模型的所有运行都是特定的比例/值?
For example, given the following scenarios: 例如,给定以下情况:
How do I write an r code that automatically always picks the threshold for sensitivity 0.8 ie scenario 2 from above? 如何编写自动始终选择灵敏度阈值0.8的r代码(即方案2)? For context, I'm using the caret modelling framework.
对于上下文,我使用插入符号建模框架。
These links on threshold optimization did not help much: 这些关于阈值优化的链接没有太大帮助:
http://topepo.github.io/caret/using-your-own-model-in-train.html#Illustration5 http://topepo.github.io/caret/using-your-own-model-in-train.html#Illustration5
(1) (1)
Say you have a data with values and true labels. 假设您有一个包含值和真实标签的数据。 Here, 5 false and 5 true
在这里,5错误和5正确
df <- data.frame(value = c(1,2,3,5,8,4,6,7,9,10),
truth = c(rep(0,5), rep(1,5)))
At threshold 9, 9 and 10 were detected as true positive, sensitivity = 40% At threshold 6 (or anything between 5 and 6), (6,7,9,10) were detected, sensitivity = 80% 在阈值9、9和10被检测为真阳性,灵敏度= 40%在阈值6(或5和6之间的任何值),检测到(6,7,9,10),灵敏度= 80%
To see the ROC curve, you can use the pROC package 要查看ROC曲线,可以使用pROC软件包
library(pROC)
roc.demo <- roc(truth ~ value, data = df)
par(pty = "s") # make it square
plot(roc.demo) # plot ROC curve
If you want percentage, do below 如果要百分比,请执行以下操作
roc.demo <- roc(truth ~ value, data = df, percent = T)
and replace 0.8 with 80 in below. 并用下面的80替换0.8。
You can get the thresholds from the roc object 您可以从roc对象获取阈值
roc.demo$thresholds[roc.demo$sensitivities == 0.8]
You might see it says 4.5 and 5.5 您可能会看到它说4.5和5.5
You may also use roc.demo$sensitivities > 0.79 & roc.demo$sensitivities < 0.81 您也可以使用roc.demo $ sensitiveivities> 0.79和roc.demo $ sensitivities <0.81
(2) (2)
Alternatively, if you just want a threshold and don't care about the specificity, you may try the quantile function 另外,如果您只是想要一个阈值而又不关心特异性,则可以尝试分位数功能
quantile(df$value[df$truth == 1],
probs = c(0.00, 0.10, 0.20, 0.30), type = 1) # percentile giving the closest number
probs=0.20 corresponds to 80% sensitivity 概率= 0.20对应于80%灵敏度
0% 10% 20% 30%
4 4 4 6
Anything threshold between 4 and 6 is what you are looking for. 您正在寻找4到6之间的任何阈值。 You may change the probs as you need.
您可以根据需要更改概率。
Hopefully, it helps. 希望它会有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.