[英]How can I apply grouped data to grouped models using broom and dplyr?
[英]annotate grouped linear models with dplyr, broom, and ggplot
我开始深入研究 broom 以可视化 dplyr/ggplot 中的简单统计分析。 我通过绑定 broom::augment 来研究如何通过分组来很好地工作来获得线性模型。
我有三个问题:
do
现在已被 cross across()
取代,但我很难弄清楚如何重写do(fit_carb = augment(lm(drat ~ mpg, data =.)))
使用across()
?#// library and data prep
library(tidyverse)
library(broom)
data <- mtcars
data$carb <- as.factor(data$carb)
#// generate scatter plot
plot <-
ggplot() +
geom_point(data = data, aes(x = mpg, y = drat, color = carb))
#// use lm function to generate linear regression model
fit <- lm(formula = drat ~ mpg, data = data)
#// tie results back into dataframe
lm_data <- augment(fit)
#// add fitted points and line
plot +
ggtitle("scatter plot with fitted points and line") +
#// add geom_point and geom_line with lm_data
geom_point(data = lm_data, aes(x = mpg, y = .fitted), color = "red") +
geom_line(data = lm_data, aes(x = mpg, y = .fitted), color = "red")
#// linear model by group
lm_data <- data %>%
#// group by factor
group_by(carb) %>%
#// `.` notation means that object gets piped into that place
do(fit_carb = augment(lm(drat ~ mpg, data = .))) %>%
#// unnest table by the augment results
unnest(fit_carb)
#// add fitted points and line grouped by carb
plot +
ggtitle("scatter plot with fitted points and line") +
#// add geom_point and geom_line with lm_data
geom_point(data = lm_data, aes(x = mpg, y = .fitted, group = carb), color = "red") +
geom_line(data = lm_data, aes(x = mpg, y = .fitted, group = carb, color = carb))
您可以省略do
dplyr 动词和 go 用于mutate
或summarise
。 根据您的图表,您不喜欢broom::glance
吗?
data %>%
group_by(carb) %>%
mutate(glance(lm(mpg ~ drat))) %>%
dplyr::select(mpg:carb,adj.r.squared,p.value)
## A tibble: 32 x 13
## Groups: carb [6]
# mpg cyl disp hp drat wt qsec vs am gear carb adj.r.squared p.value
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 0.539 0.00943
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 0.539 0.00943
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 0.643 0.0185
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 0.643 0.0185
# ...
至于绘图,我知道这不是您真正期望的,但如果您的主要目的是绘制图表,在我看来,最简单的方法是利用ggpubr::stat_regline_equation
:
library(ggpubr)
ggplot(data = data, aes(x = mpg, y = drat, color = carb)) +
ggtitle("Scatter plot with fitted points and line") +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
stat_regline_equation(label.x = with(data,tapply(mpg,carb,quantile,.6)),
label.y = with(data,tapply(drat,carb,max) - 0.2),
aes(label = ..adj.rr.label..),
show.legend = FALSE)
您可以使用额外的 arguments 调整回归到geom_smooth
。 如果你需要这个方程,你可以做类似label = paste(..eq.label.., ..adj.rr.label.., sep = "~~~")
For simple cases, it's often easier to just manually specify label.x
and label.y
, but for more complex cases you might use base R tapply
to dynamically calculate the position. stat_regline_equation 有一个stat_regline_equation
position =
参数,但我从来没有让它工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.