简体   繁体   English

R:如何使用 plot ROC 进行逻辑回归 model 缺少值

[英]R: how to plot ROC for logistic regression model whit missing values

I have a logistic regression model and I'd like to plot ROC curve.我有一个逻辑回归 model 并且我想要 plot ROC 曲线。 All variables have some missing data.所有变量都有一些缺失的数据。 Here's the summary:这是摘要:

X<-cbind(outcome, var1, var2)
summary(X)
#    outcome            var1             var2      
# Min.   :0.0000   Min.   : 0.100   Min.   : 65.1  
# 1st Qu.:0.0000   1st Qu.: 0.600   1st Qu.: 91.9  
# Median :0.0000   Median : 1.000   Median :101.0  
# Mean   :0.2643   Mean   : 2.421   Mean   :110.3  
# 3rd Qu.:1.0000   3rd Qu.: 2.200   3rd Qu.:114.5  
# Max.   :1.0000   Max.   :34.800   Max.   :388.4  
# NA's   :165      NA's   :80       NA's   :30    

The model seems to work: model 似乎工作:

model <- glm(outcome~var1+var2,family=binomial)
summary(model)
# Call:
# glm(formula = outcome ~ var1 + var2, family = binomial)
# 
# Deviance Residuals: 
#      Min        1Q    Median        3Q       Max  
# -1.63470  -0.67079  -0.56255   0.01727   2.07577  
# 
# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)    
# (Intercept) -3.652208   0.973013  -3.754 0.000174 ***
# var1         0.386811   0.147054   2.630 0.008528 ** 
# var2         0.016165   0.008075   2.002 0.045316 *  
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 135.91  on 117  degrees of freedom
# Residual deviance: 108.84  on 115  degrees of freedom
#   (187 observations deleted due to missingness)
# AIC: 114.84
# 
# Number of Fisher Scoring iterations: 6

But when I try to calcutate ROC curve, there is an error:但是当我尝试计算 ROC 曲线时,出现错误:

library(pROC)
roc(model)
# Error in roc.default(model) : No valid data provided.

I thought it could be due to missing data and I tried to add na.action = na.exclude option, but the problem still persist:我认为这可能是由于缺少数据,我尝试添加 na.action = na.exclude 选项,但问题仍然存在:

model2 <- glm(outcome~var1+var2,family=binomial, na.action = na.exclude)
roc(model2)
# Error in roc.default(model2) : No valid data provided.

I also tried with lrm instead of glm, but still doesn't work:我也尝试使用 lrm 而不是 glm,但仍然无法正常工作:

model.lrm<-lrm(outcome~var1+var2, options(na.action="na.delete"), x=TRUE, y=TRUE)
model.lrm
# Frequencies of Missing Values Due to Each Variable
# outcome    var1    var2 
#     165      80      30 
# 
# Logistic Regression Model
#  
#  lrm(formula = outcome ~ var1 + var2, data = options(na.action = "na.delete"), 
#      x = TRUE, y = TRUE)
#  
#  
#                         Model Likelihood    Discrimination    Rank Discrim.    
#                               Ratio Test           Indexes          Indexes    
#  Obs           118    LR chi2      27.07    R2       0.300    C       0.782    
#   0             87    d.f.             2    g        1.377    Dxy     0.565    
#   1             31    Pr(> chi2) <0.0001    gr       3.964    gamma   0.565    
#  max |deriv| 7e-05                          gp       0.189    tau-a   0.221    
#                                             Brier    0.150                     
 
#            Coef    S.E.   Wald Z Pr(>|Z|)
#  Intercept -3.6522 0.9730 -3.75  0.0002  
#  var1       0.3868 0.1471  2.63  0.0085  
#  var2       0.0162 0.0081  2.00  0.0453  
#  
roc(model.lrm)
# Error in roc.default(model.lrm) : No valid data provided.

Here are first 20 observations:以下是前 20 个观察结果:

> dput(head (dati[, c(2,3,4)], 20))
structure(list(outcome = c(NA, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 1, 1, 0, NA, 0), var1 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0.3, 0.5, 1.5, 4.5, 2, 2.2, 0.7, NA, NA, 0.3), 
    var2 = c(117, 84, NA, 90, 91, 113, 88, NA, 108, 178, 100, 
    86, 86, 95, 92, 111, 103, 81, NA, 95)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

What is the problem?问题是什么?

A ROC curve isn't built on a model, but on predictions derived from the model. ROC 曲线不是基于 model 构建的,而是基于从 model 得出的预测。 Therefore you need to use the predict function to obtain predictions on the data.因此,您需要使用predict function 来获得对数据的预测。 It looks like this:它看起来像这样:

predictions <- predict(model)

And then you can call the roc function with those:然后您可以使用以下命令调用roc function:

roc(outcome, predictions)

The missing values will be ignored automatically.缺失值将被自动忽略。

This makes it easy and very similar if you are using a test set:如果您使用的是测试集,这将使其变得简单且非常相似:

test_predictions <- predict(model, newdata = test_data)
roc(test_data$outcome, test_predictions)

I found a solution modifying the code as follow:我找到了一个修改代码的解决方案,如下所示:

roc(outcome, as.vector(fitted.values(model)),plot=TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM