R studio - 我需要使用混淆矩阵的灵敏度和特异性以及阳性和阴性预测值的置信区间

Question

I am writing a paper about the validity of a billing code in hospitalized children.我正在写一篇关于住院儿童账单代码有效性的论文。 I am a very novice R studio user.我是一个非常新手的 R 工作室用户。 I need the confidence intervals for the sensitive and specificity and positive and negative predictive values but I can't figure out how to do it.我需要敏感度和特异性以及阳性和阴性预测值的置信区间，但我不知道该怎么做。

My data has 3 columns: ID, true value, billing value我的数据有 3 列： ID, true value, billing value

Here is my code:这是我的代码：

confusionMatrix(table(finalcodedataset$billing_value, finalcodedataset$true_value), 
                positive="1", boot=TRUE, boot_samples=4669, alpha=0.05)

here is the output:这是 output：

Confusion Matrix and Statistics混淆矩阵和统计

       0    1
  0 4477  162

  1   10   20

               Accuracy : 0.9632          
                 95% CI : (0.9574, 0.9684)
    No Information Rate : 0.961           
    P-Value [Acc > NIR] : 0.238           

                  Kappa : 0.1796          
 Mcnemar's Test P-Value : <2e-16          

            Sensitivity : 0.109890        
            Specificity : 0.997771        
         Pos Pred Value : 0.666667        
         Neg Pred Value : 0.965079        
             Prevalence : 0.038981        
         Detection Rate : 0.004284        
   Detection Prevalence : 0.006425        
      Balanced Accuracy : 0.553831        

       'Positive' Class : 1

Answer 1

Caret and other packages use the Clopper-Pearson Interval method to calculate the confidence interval. Caret 和其他包使用Clopper-Pearson Interval方法来计算置信区间。

I consider your 2x2 reversed since the TP (True Positive) is on the bottom right.我认为你的 2x2 反转了，因为 TP（真阳性）在右下角。 If the TP is at the top left then variables (A,B,C,D) would be switched.如果 TP 在左上角，则变量 (A,B,C,D) 将被切换。

D = 4477
C = 162
B = 10
A = 20

Acc = (A+D)/(A+B+C+D)
Sensitivity = A / (A + C)
Specificity = D / (D + B)
P = (A+C)/(A+B+C+D)
PPV = (Sensitivity*P)/((Sensitivity*P)+((1-Specificity)*(1-P)))
NPV = (Specificity*(1-P))/(((1 - Sensitivity)*P)+((Specificity)*(1-P)))

n = A+B+C+D
x = n - (A+D)
alpha = 0.05

ub = 1 - ((1 + (n - x + 1)/ (x * qf(alpha *.5, 2*x, 2*(n - x + 1))))^-1)
lb = 1 - ((1 + (n - x) / ((x + 1)* qf(1-(alpha*.5), 2*(x+1), 2*(n-x))))^-1)
CI = c(lb,ub)

> Acc
[1] 0.9631613
> CI
[1] 0.9573536 0.9683800
> Sensitivity
[1] 0.1098901
> Specificity
[1] 0.9977713
> PPV
[1] 0.6666667
> NPV
[1] 0.9650787

Here is also a good resource for where these formulas come from.对于这些公式的来源，这里也是一个很好的资源。

Answer 2

You can use epiR package for this purpouse.您可以为此目的使用 EpiR package。

Example:例子：

library(epiR)
data <- as.table(matrix(c(670,202,74,640), nrow = 2, byrow = TRUE))
rval <- epi.tests(data, conf.level = 0.95)
print(rval)

          Outcome +    Outcome -      Total
Test +          670          202        872
Test -           74          640        714
Total           744          842       1586

Point estimates and 95 % CIs:
---------------------------------------------------------
Apparent prevalence                    0.55 (0.52, 0.57)
True prevalence                        0.47 (0.44, 0.49)
Sensitivity                            0.90 (0.88, 0.92)
Specificity                            0.76 (0.73, 0.79)
Positive predictive value              0.77 (0.74, 0.80)
Negative predictive value              0.90 (0.87, 0.92)
Positive likelihood ratio              3.75 (3.32, 4.24)
Negative likelihood ratio              0.13 (0.11, 0.16)
---------------------------------------------------------

Answer 3

The following reproducible example is partially inspired from ROC curve from training data in caret .以下可重现的示例的部分灵感来自于 caret 中训练数据的 ROC 曲线。

library(MLeval)
library(caret)
library(pROC)

data(Sonar)
ctrl <- trainControl(method = "cv", summaryFunction = twoClassSummary, classProbs = TRUE, savePredictions = TRUE)
set.seed(42)
fit1 <- train(Class ~ ., data = Sonar,method = "rf",trControl = ctrl)


bestmodel <- merge(fit1$bestTune, fit1$pred)
mtx <- confusionMatrix(table(bestmodel$pred, bestmodel$obs))$table

 #     M   R
 # M 104  23
 # R   7  74

# 95% Confident Interval 

## Sensitivity
sens_errors <- sqrt(sensitivity(mtx) * (1 - sensitivity(mtx)) / sum(mtx[,1]))
sensLower <- sensitivity(mtx) - 1.96 * sens_errors
sensUpper <- sensitivity(mtx) + 1.96 * sens_errors


## Specificity
spec_errors <- sqrt(specificity(mtx) * (1 - specificity(mtx)) / sum(mtx[,2]))
specLower <- specificity(mtx) - 1.96 * spec_errors
specUpper <- specificity(mtx) + 1.96 * spec_errors

## Positive Predictive Values
ppv_errors <- sqrt(posPredValue(mtx) * (1 - posPredValue(mtx)) / sum(mtx[1,]))
ppvLower <- posPredValue(mtx) - 1.96 * ppv_errors
ppvUpper <- posPredValue(mtx) + 1.96 * ppv_errors


## Negative Predictive Values
npv_errors <- sqrt(negPredValue(mtx) * (1 - negPredValue(mtx)) / sum(mtx[2,]))
npvLower <- negPredValue(mtx) - 1.96 * npv_errors
npvUpper <- negPredValue(mtx) + 1.96 * npv_errors

R studio - 我需要使用混淆矩阵的灵敏度和特异性以及阳性和阴性预测值的置信区间

问题描述

3 个解决方案

解决方案1
1 2020-12-02 22:02:55

解决方案2
1 2021-04-11 18:40:01

解决方案3
0 2020-07-27 09:25:29

R studio - 我需要使用混淆矩阵的灵敏度和特异性以及阳性和阴性预测值的置信区间

问题描述

3 个解决方案

解决方案1 1 2020-12-02 22:02:55

解决方案2 1 2021-04-11 18:40:01

解决方案3 0 2020-07-27 09:25:29

解决方案1
1 2020-12-02 22:02:55

解决方案2
1 2021-04-11 18:40:01

解决方案3
0 2020-07-27 09:25:29