[英]how to Plot the results of a logistic regression model using base R and ggplot
**creat a new data frame and add a binary column called surv24**
leukemia.data <- data.frame(wbc = leuk$wbc, ag = leuk$ag, time = leuk$time, surv24 =ifelse(leuk$time>=24, 1,0))
Wbc ag time surv24
1 2300 present 65 1
2 750 present 156 1
3 4300 present 100 1
4 2600 present 134 1
5 6000 present 16 0
6 10500 present 108 1
7 4000 absent 17 0
**logistic regression model**
logistic.model <- glm(surv24 ~ log.wbc + ag, family='binomial', data=leukemia.data)
summary(logistic.model)
fit <- predict(logistic.model, type='response')
leuki <- data.frame (cbind(leukemia.data, fit))
ag_present <- subset(leuki[leuki$ag=='present',])
ag_absent <- subset(leuki[leuki$ag=='absent',])
plot(surv24 ~ wbc, data = leukemia.data, main = "Survival Probablity vs Number of White blood cells", xlab = "Number of White blood cells", ylab = "Surviavla Probablity")
lines(ag_present$wbc, ag_present$fit, col='red')
lines(ag_absent$wbc, ag_absent$fit, col='green')
legend(0.8,85000, legend =c("Simple Linear Regression Model Predictions","Quadratic Regression Model Predictions"), col = c("green","red"), lty = 1:2, cex=0.7)
請注意,我想同時使用 (base R) 和 (ggplot) 來構建此圖
如果沒有比這更多的樣本數據,就不可能給出一個有效的例子作為答案。 因此,我將嘗試對您的數據集進行逆向工程:
set.seed(123)
wbc <- rnorm(100, 9)
logodds <- c(8 - wbc[1:50], 10 - wbc[51:100])
probs <- exp(logodds)/(1 + exp(logodds))
surv24 <- rbinom(100, 1, probs)
ag <- rep(c("absent", "present"), each = 50)
wbc <- 50 * round(wbc * 20)
leukemia.data <- data.frame(wbc, ag, time = sample(200), surv24)[sample(100),]
head(leukemia.data)
#> wbc ag time surv24
#> 24 8250 absent 133 1
#> 39 8700 absent 63 0
#> 47 8600 absent 76 0
#> 60 9200 present 23 0
#> 63 8650 present 39 1
#> 37 9550 absent 104 0
這看起來相當接近。 現在我們將創建您的 model。不清楚您為什么要記錄白細胞計數:這會扭曲您的最終結果並且沒有必要,因此我們將僅使用白細胞計數運行 model:
logistic.model <- glm(surv24 ~ wbc + ag, family = 'binomial', data = leukemia.data)
summary(logistic.model)
#>
#> Call:
#> glm(formula = surv24 ~ wbc + ag, family = "binomial", data = leukemia.data)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.0564 -0.8820 0.4460 0.8258 1.9888
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 5.7107385 2.4790234 2.304 0.02124 *
#> wbc -0.0007356 0.0002790 -2.636 0.00839 **
#> agpresent 2.1593304 0.4922130 4.387 1.15e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 138.47 on 99 degrees of freedom
#> Residual deviance: 110.56 on 97 degrees of freedom
#> AIC: 116.56
#>
#> Number of Fisher Scoring iterations: 4
為了生成漂亮的邏輯線,我們需要創建一個虛擬數據集,其中包含我們希望的 x 值范圍內的樣本 plot:
dummy_df <- data.frame(ag = rep(c("present", "absent"), each = 151),
wbc = rep(seq(0, 15000, 100), 2))
dummy_df$surv24 <- predict(logistic.model, newdata = dummy_df, type = "response")
現在我們可以得到 plot。使用 ggplot 實際上要簡單得多:
library(ggplot2)
ggplot(leukemia.data, aes(wbc, surv24, color = ag)) +
geom_point() +
geom_line(data = dummy_df) +
lims(x = c(0, 15000))
但是,要在基本 R 圖形中重新創建目標 plot,您可以執行以下操作:
plot(dummy_df$wbc[1:151]/1000, dummy_df$surv24[1:151],
type = "l", col = "green", ylim = c(0, 1),
ylab = "Probability of death prior to 24 weeks",
xlab = "WBC counts", bty = "L")
lines(dummy_df$wbc[152:302]/1000, dummy_df$surv24[152:302], col = "black")
with(leukemia.data, points(wbc[ag == "present"]/1000,
surv24[ag == "present"], col = "red"))
with(leukemia.data, points(wbc[ag == "absent"]/1000,
surv24[ag == "absent"], col = "black"))
legend("topright", legend = c("absent", "present"), title = "AG test",
lty = c(1, 1), col = c("black", "green"), pch = c(1, 2), bty = "L")
現在主要警告:由於您已經有了原始生存時間,您可能應該將其作為生存分析運行,而不是邏輯回歸,因為轉換為二元結果已經失去了很多統計能力。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.