I made a function to compute sensitivity and specificity from a confusion matrix, and only later found out the caret
package has one, confusionMatrix()
. When I tried it, things got very confusing as it appears caret
is using the wrong formulae??
Example data:
dat <- data.frame(real = as.factor(c(1,1,1,0,0,1,1,1,1)),
pred = as.factor(c(1,1,0,1,0,1,1,1,0)))
cm <- table(dat$real, dat$pred)
cm
0 1
0 1 1
1 2 5
My function:
model_metrics <- function(cm){
acc <- (cm[1] + cm[4]) / sum(cm[1:4])
# accuracy = ratio of the correctly labeled subjects to the whole pool of subjects = (TP+TN)/(TP+FP+FN+TN)
sens <- cm[4] / (cm[4] + cm[3])
# sensitivity/recall = ratio of the correctly +ve labeled to all who are +ve in reality = TP/(TP+FN)
spec <- cm[1] / (cm[1] + cm[2])
# specificity = ratio of the correctly -ve labeled cases to all who are -ve in reality = TN/(TN+FP)
err <- (cm[2] + cm[3]) / sum(cm[1:4]) #(all incorrect / all)
metrics <- data.frame(Accuracy = acc, Sensitivity = sens, Specificity = spec, Error = err)
return(metrics)
}
Now compare the results of confusionMatrix()
to those of my function:
library(caret)
c_cm <- confusionMatrix(dat$real, dat$pred)
c_cm
Reference
Prediction 0 1
0 1 1
1 2 5
c_cm$byClass
Sensitivity Specificity Pos Pred Value Neg Pred Value Precision Recall
0.3333333 0.8333333 0.5000000 0.7142857 0.5000000 0.3333333
model_metrics(cm)
Accuracy Sensitivity Specificity Error
1 0.6666667 0.8333333 0.3333333 0.3333333
Sensitivity and specificity seem to be swapped around between my function and confusionMatrix()
. I assumed I used the wrong formulae, but I double-checked on Wiki and I was right. I also double-checked that I was calling the right values from the confusion matrix, and I'm pretty sure I am. The caret
documentation also suggests it's using the correct formulae, so I have no idea what's going on.
Is the caret
function wrong, or (more likely) have I made some embarrassingly obvious mistake?
The caret function isn't wrong.
First. Consider how you construct a table. table(first, second)
will result in a table with first
in the rows and second
in the columns.
Also, when subsetting a table, one should count the cells columnwise. For example, in your function the correct way to calculate the sensitivity is
sens <- cm[4] / (cm[4] + cm[2])
Finally, it is always a good idea to read the help page of a function that doesn't give you the results you expected. ?confusionMatrix
will give you the help page.
In doing so for this function, you will find that you can specify what factor level is to be considered as a positive result (with the positive
argument).
Also, be careful with how you use the function. To avoid confusion, I would recommend using named arguments instead of relying on argument specification by place.
The first argument is data (a factor of predicted classes), the second argument reference is a factor of observed classes ( dat$real
in your case).
To get the results you want:
confusionMatrix(data = dat$pred, reference = dat$real, positive = "1")
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 1 2
1 1 5
Accuracy : 0.6667
95% CI : (0.2993, 0.9251)
No Information Rate : 0.7778
P-Value [Acc > NIR] : 0.8822
Kappa : 0.1818
Mcnemar's Test P-Value : 1.0000
Sensitivity : 0.7143
Specificity : 0.5000
Pos Pred Value : 0.8333
Neg Pred Value : 0.3333
Prevalence : 0.7778
Detection Rate : 0.5556
Detection Prevalence : 0.6667
Balanced Accuracy : 0.6071
'Positive' Class : 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.