简体   繁体   English

R中逻辑回归的confusionMatrix

[英]confusionMatrix for logistic regression in R

I want to calculate two confusion matrix for my logistic regression using my training data and my testing data: 我想使用训练数据和测试数据为逻辑回归计算两个混淆矩阵:

logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))

i set the threshold of predicted probability at 0.5: 我将预测概率的阈值设置为0.5:

confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      train$LoanStatus_B == 1))

And the the code below works well for my training set. 下面的代码非常适合我的训练集。 However, when i use the test set: 但是,当我使用测试集时:

confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      test$LoanStatus_B == 1))

it gave me an error of 它给了我一个错误

Error in table(predict(logitMod, type = "response") >= 0.5, test$LoanStatus_B == : all arguments must have the same length

Why is this? 为什么是这样? How can I fix this? 我怎样才能解决这个问题? Thank you! 谢谢!

I think there is a problem with the use of predict, since you forgot to provide the new data. 我认为使用预测有问题,因为您忘记提供新数据了。 Also, you can use the function confusionMatrix from the caret package to compute and display confusion matrices, but you don't need to table your results before that call. 另外,您可以使用caret包中的函数confusionMatrix来计算和显示混淆矩阵,但是您无需在调用之前列出结果。

Here, I created a toy dataset that includes a representative binary target variable and then I trained a model similar to what you did. 在这里,我创建了一个包含代表性二进制目标变量的玩具数据集,然后训练了与您的模型类似的模型。

train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5), b= rnorm(100), c = rnorm(100), d = rnorm(100))
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))

Now, you can predict the data (for example, your training set) and then use confusionMatrix() that takes two arguments: 现在,您可以预测数据(例如,您的训练集),然后使用confusionMatrix()两个参数的confusionMatrix()

  • your predictions 你的预测
  • the observed classes 观察到的阶级

library(caret)
# Use your model to make predictions, in this example newdata = training set, but replace with your test set    
pdata <- predict(logitMod, newdata = train, type = "response")

# use caret and compute a confusion matrix
confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B)

Here are the results 这是结果

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 66 33
         1  0  1

               Accuracy : 0.67            
                 95% CI : (0.5688, 0.7608)
    No Information Rate : 0.66            
    P-Value [Acc > NIR] : 0.4625          

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM