简体   繁体   中英

Why isnt my logistic regression model output a factor of 2 levels? (Error: `data` and `reference` should be factors with the same levels.)

From reading similar questions I know that the problem is that the yhat.logisticReg isnt a factor of 2 levels while training.prepped$TARGET_FLAG is. I assume the issue could be fixed by changing my model or in the prediction so that yhat.logisticReg is a factor of 2 levels. How can I do this?

logisticReg = glm(TARGET_FLAG ~ .,
                  data = training.prepped,
                  family = binomial())
yhat.logisticReg = predict(logisticReg, training.prepped, type = "response")
confusionMatrix(yhat.logisticReg, training.prepped$TARGET_FLAG)

Error: `data` and `reference` should be factors with the same levels.
str(training.prepped$TARGET_FLAG)
Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 2 1 ...

str(yhat.logisticReg)
 Named num [1:8161] 0.1656 0.2792 0.3717 0.0894 0.272 ...
 - attr(*, "names")= chr [1:8161] "1" "2" "3" "4" ...

You may need to choose a threshold first, and then convert your real-valued data into binary values, eg

a <- c(0.2, 0.7, 0.4)
threshold <- 0.5
binary_a <- factor(as.numeric(a>threshold))

str(binary_a)
Factor w/ 2 levels "0","1": 1 2 1

The library caret have the method confusionMatrix that have several metrics implemented. Calling overall you can get the accuracy. If you want another metric, you can check if they have it implemented and just call it.

library(caret)
acc = c()
for(value in yhat.logisticReg)
{
  predictions <- ifelse(yhat.logisticReg <= value, 0, 1)
  confusion_matrix = confusionMatrix(predictions, yhat.logisticReg)
  acc = c(acc,confusion_matrix$overall["Accuracy"])
}

best_acc = max(acc)
best_threshold  = yhat.logisticReg[which.max(acc)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM