[英]Why isnt my logistic regression model output a factor of 2 levels? (Error: `data` and `reference` should be factors with the same levels.)
From reading similar questions I know that the problem is that the yhat.logisticReg
isnt a factor of 2 levels while training.prepped$TARGET_FLAG
is.通过阅读类似的问题,我知道问题在于
yhat.logisticReg
不是 2 个级别的因子,而training.prepped$TARGET_FLAG
是。 I assume the issue could be fixed by changing my model or in the prediction so that yhat.logisticReg
is a factor of 2 levels.我认为可以通过更改我的模型或在预测中解决该问题,以便
yhat.logisticReg
是 2 个级别的因子。 How can I do this?我怎样才能做到这一点?
logisticReg = glm(TARGET_FLAG ~ .,
data = training.prepped,
family = binomial())
yhat.logisticReg = predict(logisticReg, training.prepped, type = "response")
confusionMatrix(yhat.logisticReg, training.prepped$TARGET_FLAG)
Error: `data` and `reference` should be factors with the same levels.
str(training.prepped$TARGET_FLAG)
Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 2 1 ...
str(yhat.logisticReg)
Named num [1:8161] 0.1656 0.2792 0.3717 0.0894 0.272 ...
- attr(*, "names")= chr [1:8161] "1" "2" "3" "4" ...
You may need to choose a threshold first, and then convert your real-valued data into binary values, eg您可能需要先选择一个阈值,然后将您的实值数据转换为二进制值,例如
a <- c(0.2, 0.7, 0.4)
threshold <- 0.5
binary_a <- factor(as.numeric(a>threshold))
str(binary_a)
Factor w/ 2 levels "0","1": 1 2 1
The library caret have the method confusionMatrix
that have several metrics implemented.库插入符号具有已实现多个指标的方法
confusionMatrix
矩阵。 Calling overall
you can get the accuracy. overall
调用可以得到准确度。 If you want another metric, you can check if they have it implemented and just call it.如果你想要另一个指标,你可以检查他们是否已经实现并调用它。
library(caret)
acc = c()
for(value in yhat.logisticReg)
{
predictions <- ifelse(yhat.logisticReg <= value, 0, 1)
confusion_matrix = confusionMatrix(predictions, yhat.logisticReg)
acc = c(acc,confusion_matrix$overall["Accuracy"])
}
best_acc = max(acc)
best_threshold = yhat.logisticReg[which.max(acc)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.