[英]Series of regression where the dependent variable is each level of a categorical variable
我想测试一下女性对出院日的影响。 为此,我想运行一系列回归,如果星期一是出院日,则因变量 = 1,否则 = 0。 接下来,如果是星期二,model 将为 =1,否则为 =0……等等。此时的星期几存储在一个名为wkday
的分类变量中。
例如,我将如何在for-loop
中使用tidymodel
for 快速完成此操作? 这是我到目前为止...
# libraries:
library(tidyr)
library(dplyr)
# create dataset:
id <- seq(1:1000)
wkdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
wkday <- sample(wkdays,1000, replace=T)
female <- sample(0:1, 1000, replace = T)
dta <- data.frame(id=id, wkday=wkday, female=female)
dta$mon <- ifelse(dta$wkday=="Monday",1,0)
dta$tues <- ifelse(dta$wkday=="Tuesday",1,0)
dta$wed <- ifelse(dta$wkday=="Wednesday",1,0)
dta$thurs <- ifelse(dta$wkday=="Thursday",1,0)
dta$fri <- ifelse(dta$wkday=="Friday",1,0)
dta$sat <- ifelse(dta$wkday=="Saturday",1,0)
dta$sun <- ifelse(dta$wkday=="Sunday",1,0)
# Model:
mon <- glm(mon ~ female, data=dta, family = "binomial")
tues <- glm(tues ~ female, data=dta, family = "binomial")
.
.
.
summary(mon)
summary(tues)
也许像下面这样的东西回答了这个问题。
首先,不需要一个一个地手动创建假人, model.matrix
就是为此而生的。
library(tidyr)
library(dplyr)
library(purrr)
library(broom)
# create dataset:
set.seed(2021)
id <- 1:1000
wkdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
wkday <- sample(wkdays,1000, replace=T)
female <- sample(0:1, 1000, replace = T)
dta <- data.frame(id=id, wkday=wkday, female=female)
tmp <- model.matrix(~0 + wkday, dta)
colnames(tmp) <- sub("wkday", "", colnames(tmp))
现在是模型。
cbind(dta, tmp) %>%
select(-wkday) %>%
pivot_longer(
cols = -c(id, female),
names_to = "wkday",
values_to = "dummy"
) %>%
group_by(wkday) %>%
do(tidy(lm(dummy ~ female, data = .)))
## A tibble: 14 x 6
## Groups: wkday [7]
# wkday term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Friday (Intercept) 0.128 0.0155 8.26 4.55e-16
# 2 Friday female 0.0226 0.0219 1.03 3.03e- 1
# 3 Monday (Intercept) 0.152 0.0163 9.30 8.52e-20
# 4 Monday female 0.0126 0.0231 0.547 5.84e- 1
# 5 Saturday (Intercept) 0.170 0.0163 10.4 3.78e-24
# 6 Saturday female -0.0234 0.0231 -1.01 3.12e- 1
# 7 Sunday (Intercept) 0.138 0.0156 8.82 4.88e-18
# 8 Sunday female 0.00857 0.0221 0.388 6.98e- 1
# 9 Thursday (Intercept) 0.128 0.0157 8.14 1.13e-15
#10 Thursday female 0.0326 0.0222 1.47 1.43e- 1
#11 Tuesday (Intercept) 0.138 0.0145 9.49 1.63e-20
#12 Tuesday female -0.0355 0.0205 -1.73 8.41e- 2
#13 Wednesday (Intercept) 0.148 0.0155 9.55 9.66e-21
#14 Wednesday female -0.0174 0.0219 -0.797 4.26e- 1
没有最终 output 中的截距(但带有截距的模型):
cbind(dta, tmp) %>%
select(-wkday) %>%
pivot_longer(
cols = -c(id, female),
names_to = "wkday",
values_to = "dummy"
) %>%
group_by(wkday) %>%
do(tidy(lm(dummy ~ female, data = .))) %>%
filter(term != "(Intercept)")
## A tibble: 7 x 6
## Groups: wkday [7]
# wkday term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 Friday female 0.0226 0.0219 1.03 0.303
#2 Monday female 0.0126 0.0231 0.547 0.584
#3 Saturday female -0.0234 0.0231 -1.01 0.312
#4 Sunday female 0.00857 0.0221 0.388 0.698
#5 Thursday female 0.0326 0.0222 1.47 0.143
#6 Tuesday female -0.0355 0.0205 -1.73 0.0841
#7 Wednesday female -0.0174 0.0219 -0.797 0.426
这个比较简单,不需要创建tmp
,可以在 pipe 中创建。 过滤掉拦截是可选的。
dta%>%
bind_cols(
model.matrix(~0 + wkday, dta) %>% as.data.frame
) %>%
select(-wkday) %>%
pivot_longer(
cols = -c(id, female),
names_to = "wkday",
values_to = "dummy"
) %>%
mutate(wkday = sub("^wkday", "", wkday)) %>%
group_by(wkday) %>%
do(tidy(lm(dummy ~ female, data = .)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.