简体   繁体   中英

How to obtain confusion matrix using caret package?

I was trying to analyse example provided by caret package for confusionMatrix ie

lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
                levels = rev(lvs))
pred <- factor(
  c(
    rep(lvs, times = c(54, 32)),
    rep(lvs, times = c(27, 231))),
  levels = rev(lvs))

xtab <- table(pred, truth)

confusionMatrix(xtab)

However to be sure I don't quite understand it. Let's just pick for example this very simple model:

set.seed(42)
x <- sample(0:1, 100, T)
y <- rnorm(100)
glm(x ~ y, family = binomial('logit'))

And I don't know how can I analogously perform confusion matrix for this glm model. Do you understand how it can be done?

EDIT

I tried to run an example provided in comments:

train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5), b= rnorm(100), c = rnorm(100), d = rnorm(100))
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))
library(caret)
# Use your model to make predictions, in this example newdata = training set, but replace with your test set    
pdata <- predict(logitMod, newdata = train, type = "response")

confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B)

but I gain error: data and reference` should be factors with the same levels

Am I doing something incorrectly?

You just need to turn them into factors:

confusionMatrix(data = as.factor(as.numeric(pdata>0.5)), 
                reference = as.factor(train$LoanStatus_B))
# Confusion Matrix and Statistics
# 
# Reference
# Prediction  0  1
#          0 61 31
#          1  2  6
# 
# Accuracy : 0.67            
# 95% CI : (0.5688, 0.7608)
# No Information Rate : 0.63            
# P-Value [Acc > NIR] : 0.2357          
# 
# Kappa : 0.1556          
# 
# Mcnemar's Test P-Value : 1.093e-06       
#                                           
#             Sensitivity : 0.9683          
#             Specificity : 0.1622          
#          Pos Pred Value : 0.6630          
#          Neg Pred Value : 0.7500          
#              Prevalence : 0.6300          
#          Detection Rate : 0.6100          
#    Detection Prevalence : 0.9200          
#       Balanced Accuracy : 0.5652          
#                                           
#        'Positive' Class : 0               
                              

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM