From reading similar questions I know that the problem is that the yhat.logisticReg
isnt a factor of 2 levels while training.prepped$TARGET_FLAG
is. I assume the issue could be fixed by changing my model or in the prediction so that yhat.logisticReg
is a factor of 2 levels. How can I do this?
logisticReg = glm(TARGET_FLAG ~ .,
data = training.prepped,
family = binomial())
yhat.logisticReg = predict(logisticReg, training.prepped, type = "response")
confusionMatrix(yhat.logisticReg, training.prepped$TARGET_FLAG)
Error: `data` and `reference` should be factors with the same levels.
str(training.prepped$TARGET_FLAG)
Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 2 1 ...
str(yhat.logisticReg)
Named num [1:8161] 0.1656 0.2792 0.3717 0.0894 0.272 ...
- attr(*, "names")= chr [1:8161] "1" "2" "3" "4" ...
You may need to choose a threshold first, and then convert your real-valued data into binary values, eg
a <- c(0.2, 0.7, 0.4)
threshold <- 0.5
binary_a <- factor(as.numeric(a>threshold))
str(binary_a)
Factor w/ 2 levels "0","1": 1 2 1
The library caret have the method confusionMatrix
that have several metrics implemented. Calling overall
you can get the accuracy. If you want another metric, you can check if they have it implemented and just call it.
library(caret)
acc = c()
for(value in yhat.logisticReg)
{
predictions <- ifelse(yhat.logisticReg <= value, 0, 1)
confusion_matrix = confusionMatrix(predictions, yhat.logisticReg)
acc = c(acc,confusion_matrix$overall["Accuracy"])
}
best_acc = max(acc)
best_threshold = yhat.logisticReg[which.max(acc)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.