简体   繁体   中英

caret confusionMatrix arguments levels that overlap issue

Problem: Confusion matrix (caret package, R), error in data on levels that overlap

Goal: Create Confusion matrix in order to obtain 'Accuracy', 'Sensitivity', 'Specificity' from referenced confusion matrix structure.

Have working, contingency table for prediction table:

    > loans_prediction_table
       model_prediction
         Bad Good
      0  120  710
      1   81 2976
    > 

Error received:

    Error in confusionMatrix.default(df_loans_train_data$statusRank, 
    loans_predict.predicted,  : 
      The data must contain some levels that overlap the reference.

Alternate solution with as.factor(), eg, same error result:

    model_prediction_cm <- 
    confusionMatrix(as.factor(df_loans_train_data$statusRank), 
    as.factor(loans_predict.predicted), positive = "Good")

Alternative solution with as.factor(), eg, confusionMatrix( as.factor()...as.factor() ), generated 'same length' error:

    loans_predict.predicted <- factor(ifelse(loans_predict < 0.5, 0, 1))
    model_prediction_cm <- 
    confusionMatrix(as.factor(loans_predict.predicted), 
    as.factor(df_loans_train_data$statusRank))

## result error:
    > model_prediction_cm <- 
    confusionMatrix(as.factor(loans_predict.predicted), 
    as.factor(df_loans_train_data$statusRank))
    Error in table(data, reference, dnn = dnn, ...) : 
      all arguments must have the same length
> 

Data used:

    > head(df_loans_train_data$statusRank, 10)
     [1] 1 1 0 0 1 1 1 0 1 0
    Levels: 0 1
    > str(df_loans_train_data$statusRank)
     Factor w/ 2 levels "0","1": 2 2 1 1 2 2 2 1 2 1 ...
    > head(loans_predict.predicted)
    11413  2561 25337  1643 14264 24191 
      Bad  <NA>   Bad   Bad   Bad   Bad 
    Levels: Bad Good
    > str(loans_predict.predicted)
     Factor w/ 2 levels "Bad","Good": 1 NA 1 1 1 1 1 1 1 1 ...
     - attr(*, "names")= chr [1:4158] "11413" "2561" "25337" "1643" ... 
    > 
    loans_train_data = na.omit(loans_train_data)
    df_loans_train_data <- as.data.frame(loans_train_data)
    loans_predict.predicted <- factor(ifelse(loans_predict < 0.5, "Good", 
    "Bad"))

    ## problem code: confusionMatrix()
    model_prediction_cm <- confusionMatrix(df_loans_train_data$statusRank, 
    loans_predict.predicted, positive = "Good")

    model_prediction_cm$overall['Accuracy']
    model_prediction_cm$overall['Sensitivity']
    model_prediction_cm$overall['Specificity']

Debug sample data: dput(loans_predict.predicted)

`33258` = 2L, `7249` = 2L, `4681` = 2L, `7040` = 2L, `5378` = 2L, 
`13420` = 2L, `14028` = 2L, `23267` = 2L, `32953` = 2L, `26529` = 2L, 
`30617` = 2L, `32348` = NA, `10303` = 2L, `20425` = 2L, `23817` = 2L, 
`9459` = 2L, `33474` = 2L, `993` = 2L, `33870` = 2L, `33751` = 2L, 
`26626` = 2L, `8784` = 2L, `32525` = 2L, `29272` = 2L, `5600` = 2L, 
`33324` = 2L, `25767` = 2L, `25290` = 2L, `29297` = 2L, `27529` = NA, 
`21944` = 2L, `27563` = 2L, `644` = 2L, `1348` = NA, `30568` = NA, 
`26078` = 1L, `24222` = 2L, `28581` = 2L, `8299` = 2L, `16639` = 2L, 
`33609` = 2L, `14870` = 2L, `33056` = 2L, `33162` = 2L, `4609` = 2L, 
`28794` = 2L, `30851` = NA, `10850` = 2L, `16848` = 2L, `33720` = 1L, 
`11570` = 2L, `16509` = 2L, `19207` = 2L, `29265` = 2L, `24578` = 2L, 
`10129` = 2L, `27090` = 1L, `27485` = 2L, `28897` = 2L, `10176` = 2L, 
`20959` = 2L, `4982` = 2L, `8021` = 2L, `1428` = 2L, `24250` = 2L, 
`2929` = 2L, `14207` = 2L, `20656` = 2L, `23423` = 2L, `31682` = 2L, 
`31989` = 1L, `13545` = 2L, `8453` = NA, `5468` = 2L, `15002` = 2L, 
`29944` = 2L, `27050` = 2L, `32108` = 2L, `27711` = NA, `6610` = 2L, 
`26874` = 2L, `27817` = 2L, `29768` = 2L, `16522` = 2L, `16917` = NA, 
`14174` = 2L, `34318` = 2L, `16784` = 2L, `5040` = 2L, `18617` = 2L, 
`32843` = 1L, `18461` = 2L, `10857` = 2L, `24549` = 2L, `12866` = 2L, 
`14067` = 2L, `16067` = 2L, `18493` = 2L, `8966` = 2L, `8509` = 2L,

Debug

dput(model_prediction_cm)
Error in dput(model_prediction_cm) : 
  object 'model_prediction_cm' not found

I did solved the problem of "The data must contain some levels that overlap the reference". from the Confusion Matrix, using the following command:

library(ggplot2) library(dplyr) library(magrittr) set.seed(123) data <- Dados3 %>% mutate(set = ifelse(runif(nrow(.)) > 0.75, "teste", "treino")) treino <- data %>% filter(set == "treino") %>% select(-set) teste <- data %>% filter(set == "teste") %>% select(-set) glimpse(data)

Where:

Dados3 is my Dataset treino (Portuguese)= train (English) teste (Portuguese)= test (English)

I hope it can be useful.

Good lucky!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM