简体   繁体   中英

Caret confusionMatrix measures are wrong?

I made a function to compute sensitivity and specificity from a confusion matrix, and only later found out the caret package has one, confusionMatrix() . When I tried it, things got very confusing as it appears caret is using the wrong formulae??

Example data:

dat <- data.frame(real = as.factor(c(1,1,1,0,0,1,1,1,1)),
                  pred = as.factor(c(1,1,0,1,0,1,1,1,0)))
cm <- table(dat$real, dat$pred)
cm
    0 1
  0 1 1
  1 2 5

My function:

model_metrics <- function(cm){
  acc <- (cm[1] + cm[4]) / sum(cm[1:4])
  # accuracy = ratio of the correctly labeled subjects to the whole pool of subjects = (TP+TN)/(TP+FP+FN+TN)
  sens <- cm[4] / (cm[4] + cm[3])
  # sensitivity/recall = ratio of the correctly +ve labeled to all who are +ve in reality = TP/(TP+FN)
  spec <- cm[1] / (cm[1] + cm[2])
  # specificity = ratio of the correctly -ve labeled cases to all who are -ve in reality = TN/(TN+FP)
  err <- (cm[2] + cm[3]) / sum(cm[1:4]) #(all incorrect / all)
  metrics <- data.frame(Accuracy = acc, Sensitivity = sens, Specificity = spec, Error = err)
  return(metrics)
}

Now compare the results of confusionMatrix() to those of my function:

library(caret)
c_cm <- confusionMatrix(dat$real, dat$pred)
c_cm
          Reference
Prediction 0 1
         0 1 1
         1 2 5
c_cm$byClass
Sensitivity          Specificity       Pos Pred Value       Neg Pred Value            Precision               Recall 
  0.3333333            0.8333333            0.5000000            0.7142857            0.5000000            0.3333333

model_metrics(cm)
  Accuracy Sensitivity Specificity     Error
1 0.6666667   0.8333333   0.3333333 0.3333333

Sensitivity and specificity seem to be swapped around between my function and confusionMatrix() . I assumed I used the wrong formulae, but I double-checked on Wiki and I was right. I also double-checked that I was calling the right values from the confusion matrix, and I'm pretty sure I am. The caret documentation also suggests it's using the correct formulae, so I have no idea what's going on.

Is the caret function wrong, or (more likely) have I made some embarrassingly obvious mistake?

The caret function isn't wrong.

First. Consider how you construct a table. table(first, second) will result in a table with first in the rows and second in the columns.

Also, when subsetting a table, one should count the cells columnwise. For example, in your function the correct way to calculate the sensitivity is

 sens <- cm[4] / (cm[4] + cm[2])

Finally, it is always a good idea to read the help page of a function that doesn't give you the results you expected. ?confusionMatrix will give you the help page.

In doing so for this function, you will find that you can specify what factor level is to be considered as a positive result (with the positive argument).

Also, be careful with how you use the function. To avoid confusion, I would recommend using named arguments instead of relying on argument specification by place.

The first argument is data (a factor of predicted classes), the second argument reference is a factor of observed classes ( dat$real in your case).

To get the results you want:

confusionMatrix(data = dat$pred, reference = dat$real, positive = "1")

Confusion Matrix and Statistics

          Reference
Prediction 0 1
         0 1 2
         1 1 5
                                          
               Accuracy : 0.6667          
                 95% CI : (0.2993, 0.9251)
    No Information Rate : 0.7778          
    P-Value [Acc > NIR] : 0.8822          
                                          
                  Kappa : 0.1818          
                                          
 Mcnemar's Test P-Value : 1.0000          
                                          
            Sensitivity : 0.7143          
            Specificity : 0.5000          
         Pos Pred Value : 0.8333          
         Neg Pred Value : 0.3333          
             Prevalence : 0.7778          
         Detection Rate : 0.5556          
   Detection Prevalence : 0.6667          
      Balanced Accuracy : 0.6071          
                                          
       'Positive' Class : 1 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM