簡體   English   中英

如何使用基數 R 和 ggplot Plot 邏輯回歸 model 的結果

[英]how to Plot the results of a logistic regression model using base R and ggplot

**creat a new data frame and add a binary column called surv24** 

leukemia.data <- data.frame(wbc = leuk$wbc, ag = leuk$ag, time = leuk$time, surv24 =ifelse(leuk$time>=24, 1,0)) 


    Wbc     ag      time    surv24
1   2300    present  65      1
2   750     present  156     1
3   4300    present  100     1
4   2600    present  134     1
5   6000    present  16      0
6   10500   present  108     1
7    4000   absent   17      0




**logistic regression model**

logistic.model <- glm(surv24 ~ log.wbc + ag, family='binomial', data=leukemia.data)
summary(logistic.model)

fit <- predict(logistic.model, type='response')

leuki <- data.frame (cbind(leukemia.data, fit))

ag_present <- subset(leuki[leuki$ag=='present',])
ag_absent <- subset(leuki[leuki$ag=='absent',])


plot(surv24 ~ wbc, data = leukemia.data, main = "Survival Probablity vs Number of White blood cells",  xlab = "Number of White blood cells", ylab = "Surviavla Probablity")
lines(ag_present$wbc, ag_present$fit, col='red')
lines(ag_absent$wbc, ag_absent$fit, col='green')
legend(0.8,85000, legend =c("Simple Linear Regression Model Predictions","Quadratic Regression Model Predictions"), col = c("green","red"), lty = 1:2, cex=0.7)

這是 output 我得到的是 plot 在此處輸入圖像描述

這是我要構建的圖形的示例
在此處輸入圖像描述

請注意,我想同時使用 (base R) 和 (ggplot) 來構建此圖

如果沒有比這更多的樣本數據,就不可能給出一個有效的例子作為答案。 因此,我將嘗試對您的數據集進行逆向工程:

set.seed(123)

wbc     <- rnorm(100, 9)
logodds <- c(8 - wbc[1:50], 10 - wbc[51:100])
probs   <- exp(logodds)/(1 + exp(logodds))
surv24  <- rbinom(100, 1, probs)
ag      <- rep(c("absent", "present"), each = 50)
wbc     <- 50 * round(wbc * 20)
leukemia.data <- data.frame(wbc, ag, time = sample(200), surv24)[sample(100),]

head(leukemia.data)
#>     wbc      ag time surv24
#> 24 8250  absent  133      1
#> 39 8700  absent   63      0
#> 47 8600  absent   76      0
#> 60 9200 present   23      0
#> 63 8650 present   39      1
#> 37 9550  absent  104      0

這看起來相當接近。 現在我們將創建您的 model。不清楚您為什么要記錄白細胞計數:這會扭曲您的最終結果並且沒有必要,因此我們將僅使用白細胞計數運行 model:

logistic.model <- glm(surv24 ~ wbc + ag, family = 'binomial', data = leukemia.data)

summary(logistic.model)
#> 
#> Call:
#> glm(formula = surv24 ~ wbc + ag, family = "binomial", data = leukemia.data)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.0564  -0.8820   0.4460   0.8258   1.9888  
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  5.7107385  2.4790234   2.304  0.02124 *  
#> wbc         -0.0007356  0.0002790  -2.636  0.00839 ** 
#> agpresent    2.1593304  0.4922130   4.387 1.15e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 138.47  on 99  degrees of freedom
#> Residual deviance: 110.56  on 97  degrees of freedom
#> AIC: 116.56
#> 
#> Number of Fisher Scoring iterations: 4

為了生成漂亮的邏輯線,我們需要創建一個虛擬數據集,其中包含我們希望的 x 值范圍內的樣本 plot:

dummy_df <- data.frame(ag = rep(c("present", "absent"), each = 151),
                       wbc = rep(seq(0, 15000, 100), 2))

dummy_df$surv24 <- predict(logistic.model, newdata = dummy_df, type = "response")

現在我們可以得到 plot。使用 ggplot 實際上要簡單得多:

library(ggplot2)

ggplot(leukemia.data, aes(wbc, surv24, color = ag)) + 
  geom_point() +
  geom_line(data = dummy_df) +
  lims(x = c(0, 15000))

但是,要在基本 R 圖形中重新創建目標 plot,您可以執行以下操作:

plot(dummy_df$wbc[1:151]/1000, dummy_df$surv24[1:151], 
     type = "l", col = "green", ylim = c(0, 1),
     ylab = "Probability of death prior to 24 weeks",
     xlab = "WBC counts", bty = "L")
lines(dummy_df$wbc[152:302]/1000, dummy_df$surv24[152:302], col = "black")
with(leukemia.data, points(wbc[ag == "present"]/1000, 
                           surv24[ag == "present"], col = "red"))
with(leukemia.data, points(wbc[ag == "absent"]/1000, 
                           surv24[ag == "absent"], col = "black"))
legend("topright", legend = c("absent", "present"), title = "AG test",
       lty = c(1, 1), col = c("black", "green"), pch = c(1, 2), bty = "L")

現在主要警告:由於您已經有了原始生存時間,您可能應該將其作為生存分析運行,而不是邏輯回歸,因為轉換為二元結果已經失去了很多統計能力。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM