簡體   English   中英

R:如何使用 plot ROC 進行邏輯回歸 model 缺少值

[英]R: how to plot ROC for logistic regression model whit missing values

我有一個邏輯回歸 model 並且我想要 plot ROC 曲線。 所有變量都有一些缺失的數據。 這是摘要:

X<-cbind(outcome, var1, var2)
summary(X)
#    outcome            var1             var2      
# Min.   :0.0000   Min.   : 0.100   Min.   : 65.1  
# 1st Qu.:0.0000   1st Qu.: 0.600   1st Qu.: 91.9  
# Median :0.0000   Median : 1.000   Median :101.0  
# Mean   :0.2643   Mean   : 2.421   Mean   :110.3  
# 3rd Qu.:1.0000   3rd Qu.: 2.200   3rd Qu.:114.5  
# Max.   :1.0000   Max.   :34.800   Max.   :388.4  
# NA's   :165      NA's   :80       NA's   :30    

model 似乎工作:

model <- glm(outcome~var1+var2,family=binomial)
summary(model)
# Call:
# glm(formula = outcome ~ var1 + var2, family = binomial)
# 
# Deviance Residuals: 
#      Min        1Q    Median        3Q       Max  
# -1.63470  -0.67079  -0.56255   0.01727   2.07577  
# 
# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)    
# (Intercept) -3.652208   0.973013  -3.754 0.000174 ***
# var1         0.386811   0.147054   2.630 0.008528 ** 
# var2         0.016165   0.008075   2.002 0.045316 *  
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 135.91  on 117  degrees of freedom
# Residual deviance: 108.84  on 115  degrees of freedom
#   (187 observations deleted due to missingness)
# AIC: 114.84
# 
# Number of Fisher Scoring iterations: 6

但是當我嘗試計算 ROC 曲線時,出現錯誤:

library(pROC)
roc(model)
# Error in roc.default(model) : No valid data provided.

我認為這可能是由於缺少數據,我嘗試添加 na.action = na.exclude 選項,但問題仍然存在:

model2 <- glm(outcome~var1+var2,family=binomial, na.action = na.exclude)
roc(model2)
# Error in roc.default(model2) : No valid data provided.

我也嘗試使用 lrm 而不是 glm,但仍然無法正常工作:

model.lrm<-lrm(outcome~var1+var2, options(na.action="na.delete"), x=TRUE, y=TRUE)
model.lrm
# Frequencies of Missing Values Due to Each Variable
# outcome    var1    var2 
#     165      80      30 
# 
# Logistic Regression Model
#  
#  lrm(formula = outcome ~ var1 + var2, data = options(na.action = "na.delete"), 
#      x = TRUE, y = TRUE)
#  
#  
#                         Model Likelihood    Discrimination    Rank Discrim.    
#                               Ratio Test           Indexes          Indexes    
#  Obs           118    LR chi2      27.07    R2       0.300    C       0.782    
#   0             87    d.f.             2    g        1.377    Dxy     0.565    
#   1             31    Pr(> chi2) <0.0001    gr       3.964    gamma   0.565    
#  max |deriv| 7e-05                          gp       0.189    tau-a   0.221    
#                                             Brier    0.150                     
 
#            Coef    S.E.   Wald Z Pr(>|Z|)
#  Intercept -3.6522 0.9730 -3.75  0.0002  
#  var1       0.3868 0.1471  2.63  0.0085  
#  var2       0.0162 0.0081  2.00  0.0453  
#  
roc(model.lrm)
# Error in roc.default(model.lrm) : No valid data provided.

以下是前 20 個觀察結果:

> dput(head (dati[, c(2,3,4)], 20))
structure(list(outcome = c(NA, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 1, 1, 0, NA, 0), var1 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0.3, 0.5, 1.5, 4.5, 2, 2.2, 0.7, NA, NA, 0.3), 
    var2 = c(117, 84, NA, 90, 91, 113, 88, NA, 108, 178, 100, 
    86, 86, 95, 92, 111, 103, 81, NA, 95)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

問題是什么?

ROC 曲線不是基於 model 構建的,而是基於從 model 得出的預測。 因此,您需要使用predict function 來獲得對數據的預測。 它看起來像這樣:

predictions <- predict(model)

然后您可以使用以下命令調用roc function:

roc(outcome, predictions)

缺失值將被自動忽略。

如果您使用的是測試集,這將使其變得簡單且非常相似:

test_predictions <- predict(model, newdata = test_data)
roc(test_data$outcome, test_predictions)

我找到了一個修改代碼的解決方案,如下所示:

roc(outcome, as.vector(fitted.values(model)),plot=TRUE)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM