Problem: Confusion matrix (caret package, R), error in data on levels that overlap
Goal: Create Confusion matrix in order to obtain 'Accuracy', 'Sensitivity', 'Specificity' from referenced confusion matrix structure.
Have working, contingency table for prediction table:
> loans_prediction_table
model_prediction
Bad Good
0 120 710
1 81 2976
>
Error received:
Error in confusionMatrix.default(df_loans_train_data$statusRank,
loans_predict.predicted, :
The data must contain some levels that overlap the reference.
Alternate solution with as.factor(), eg, same error result:
model_prediction_cm <-
confusionMatrix(as.factor(df_loans_train_data$statusRank),
as.factor(loans_predict.predicted), positive = "Good")
Alternative solution with as.factor(), eg, confusionMatrix( as.factor()...as.factor() ), generated 'same length' error:
loans_predict.predicted <- factor(ifelse(loans_predict < 0.5, 0, 1))
model_prediction_cm <-
confusionMatrix(as.factor(loans_predict.predicted),
as.factor(df_loans_train_data$statusRank))
## result error:
> model_prediction_cm <-
confusionMatrix(as.factor(loans_predict.predicted),
as.factor(df_loans_train_data$statusRank))
Error in table(data, reference, dnn = dnn, ...) :
all arguments must have the same length
>
Data used:
> head(df_loans_train_data$statusRank, 10)
[1] 1 1 0 0 1 1 1 0 1 0
Levels: 0 1
> str(df_loans_train_data$statusRank)
Factor w/ 2 levels "0","1": 2 2 1 1 2 2 2 1 2 1 ...
> head(loans_predict.predicted)
11413 2561 25337 1643 14264 24191
Bad <NA> Bad Bad Bad Bad
Levels: Bad Good
> str(loans_predict.predicted)
Factor w/ 2 levels "Bad","Good": 1 NA 1 1 1 1 1 1 1 1 ...
- attr(*, "names")= chr [1:4158] "11413" "2561" "25337" "1643" ...
>
loans_train_data = na.omit(loans_train_data)
df_loans_train_data <- as.data.frame(loans_train_data)
loans_predict.predicted <- factor(ifelse(loans_predict < 0.5, "Good",
"Bad"))
## problem code: confusionMatrix()
model_prediction_cm <- confusionMatrix(df_loans_train_data$statusRank,
loans_predict.predicted, positive = "Good")
model_prediction_cm$overall['Accuracy']
model_prediction_cm$overall['Sensitivity']
model_prediction_cm$overall['Specificity']
Debug sample data: dput(loans_predict.predicted)
`33258` = 2L, `7249` = 2L, `4681` = 2L, `7040` = 2L, `5378` = 2L,
`13420` = 2L, `14028` = 2L, `23267` = 2L, `32953` = 2L, `26529` = 2L,
`30617` = 2L, `32348` = NA, `10303` = 2L, `20425` = 2L, `23817` = 2L,
`9459` = 2L, `33474` = 2L, `993` = 2L, `33870` = 2L, `33751` = 2L,
`26626` = 2L, `8784` = 2L, `32525` = 2L, `29272` = 2L, `5600` = 2L,
`33324` = 2L, `25767` = 2L, `25290` = 2L, `29297` = 2L, `27529` = NA,
`21944` = 2L, `27563` = 2L, `644` = 2L, `1348` = NA, `30568` = NA,
`26078` = 1L, `24222` = 2L, `28581` = 2L, `8299` = 2L, `16639` = 2L,
`33609` = 2L, `14870` = 2L, `33056` = 2L, `33162` = 2L, `4609` = 2L,
`28794` = 2L, `30851` = NA, `10850` = 2L, `16848` = 2L, `33720` = 1L,
`11570` = 2L, `16509` = 2L, `19207` = 2L, `29265` = 2L, `24578` = 2L,
`10129` = 2L, `27090` = 1L, `27485` = 2L, `28897` = 2L, `10176` = 2L,
`20959` = 2L, `4982` = 2L, `8021` = 2L, `1428` = 2L, `24250` = 2L,
`2929` = 2L, `14207` = 2L, `20656` = 2L, `23423` = 2L, `31682` = 2L,
`31989` = 1L, `13545` = 2L, `8453` = NA, `5468` = 2L, `15002` = 2L,
`29944` = 2L, `27050` = 2L, `32108` = 2L, `27711` = NA, `6610` = 2L,
`26874` = 2L, `27817` = 2L, `29768` = 2L, `16522` = 2L, `16917` = NA,
`14174` = 2L, `34318` = 2L, `16784` = 2L, `5040` = 2L, `18617` = 2L,
`32843` = 1L, `18461` = 2L, `10857` = 2L, `24549` = 2L, `12866` = 2L,
`14067` = 2L, `16067` = 2L, `18493` = 2L, `8966` = 2L, `8509` = 2L,
Debug
dput(model_prediction_cm)
Error in dput(model_prediction_cm) :
object 'model_prediction_cm' not found
I did solved the problem of "The data must contain some levels that overlap the reference". from the Confusion Matrix, using the following command:
library(ggplot2) library(dplyr) library(magrittr) set.seed(123) data <- Dados3 %>% mutate(set = ifelse(runif(nrow(.)) > 0.75, "teste", "treino")) treino <- data %>% filter(set == "treino") %>% select(-set) teste <- data %>% filter(set == "teste") %>% select(-set) glimpse(data)
Where:
Dados3 is my Dataset treino (Portuguese)= train (English) teste (Portuguese)= test (English)
I hope it can be useful.
Good lucky!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.