简体   繁体   English

因变量是分类变量的每个级别的回归系列

[英]Series of regression where the dependent variable is each level of a categorical variable

I would like to test how being female affects the day of hospital discharge.我想测试一下女性对出院日的影响。 For this I would like to run a series of regression where the dependent variable is =1 if Monday is the discharge day and =0 otherwise.为此,我想运行一系列回归,如果星期一是出院日,则因变量 = 1,否则 = 0。 Next, model would be =1 if Tuesday, and =0 otherwise... etc. The days of the week at the moment are stored in a categorical variable called wkday .接下来,如果是星期二,model 将为 =1,否则为 =0……等等。此时的星期几存储在一个名为wkday的分类变量中。

How would I do this quickly using tidymodel for in a for-loop for example?例如,我将如何在for-loop中使用tidymodel for 快速完成此操作? Here is what I have so far...这是我到目前为止...

# libraries:
library(tidyr)
library(dplyr)

# create dataset:
id <- seq(1:1000)
wkdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
wkday <- sample(wkdays,1000, replace=T)
female <- sample(0:1, 1000, replace = T)

dta <- data.frame(id=id, wkday=wkday, female=female)

dta$mon <- ifelse(dta$wkday=="Monday",1,0)
dta$tues <- ifelse(dta$wkday=="Tuesday",1,0)
dta$wed <- ifelse(dta$wkday=="Wednesday",1,0)
dta$thurs <- ifelse(dta$wkday=="Thursday",1,0)
dta$fri <- ifelse(dta$wkday=="Friday",1,0)
dta$sat <- ifelse(dta$wkday=="Saturday",1,0)
dta$sun <- ifelse(dta$wkday=="Sunday",1,0)



# Model:
mon <- glm(mon ~ female, data=dta, family = "binomial")
tues <- glm(tues ~ female, data=dta, family = "binomial")

.
.
.

summary(mon)
summary(tues)

Maybe something like the following answers the question.也许像下面这样的东西回答了这个问题。
First of all, there is no need to create the dummies one by one by hand, model.matrix is meant for that.首先,不需要一个一个地手动创建假人, model.matrix就是为此而生的。

library(tidyr)
library(dplyr)
library(purrr)
library(broom)

# create dataset:
set.seed(2021)
id <- 1:1000
wkdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
wkday <- sample(wkdays,1000, replace=T)
female <- sample(0:1, 1000, replace = T)

dta <- data.frame(id=id, wkday=wkday, female=female)

tmp <- model.matrix(~0 + wkday, dta)
colnames(tmp) <- sub("wkday", "", colnames(tmp))

Now the models.现在是模型。

cbind(dta, tmp) %>% 
  select(-wkday) %>%
  pivot_longer(
    cols = -c(id, female),
    names_to = "wkday",
    values_to = "dummy"
  ) %>% 
  group_by(wkday) %>%
  do(tidy(lm(dummy ~ female, data = .)))
## A tibble: 14 x 6
## Groups:   wkday [7]
#   wkday     term        estimate std.error statistic  p.value
#   <chr>     <chr>          <dbl>     <dbl>     <dbl>    <dbl>
# 1 Friday    (Intercept)  0.128      0.0155     8.26  4.55e-16
# 2 Friday    female       0.0226     0.0219     1.03  3.03e- 1
# 3 Monday    (Intercept)  0.152      0.0163     9.30  8.52e-20
# 4 Monday    female       0.0126     0.0231     0.547 5.84e- 1
# 5 Saturday  (Intercept)  0.170      0.0163    10.4   3.78e-24
# 6 Saturday  female      -0.0234     0.0231    -1.01  3.12e- 1
# 7 Sunday    (Intercept)  0.138      0.0156     8.82  4.88e-18
# 8 Sunday    female       0.00857    0.0221     0.388 6.98e- 1
# 9 Thursday  (Intercept)  0.128      0.0157     8.14  1.13e-15
#10 Thursday  female       0.0326     0.0222     1.47  1.43e- 1
#11 Tuesday   (Intercept)  0.138      0.0145     9.49  1.63e-20
#12 Tuesday   female      -0.0355     0.0205    -1.73  8.41e- 2
#13 Wednesday (Intercept)  0.148      0.0155     9.55  9.66e-21
#14 Wednesday female      -0.0174     0.0219    -0.797 4.26e- 1

Without the intercepts in the final output (but models with intercept):没有最终 output 中的截距(但带有截距的模型):

cbind(dta, tmp) %>% 
  select(-wkday) %>%
  pivot_longer(
    cols = -c(id, female),
    names_to = "wkday",
    values_to = "dummy"
  ) %>% 
  group_by(wkday) %>%
  do(tidy(lm(dummy ~ female, data = .))) %>%
  filter(term != "(Intercept)")
## A tibble: 7 x 6
## Groups:   wkday [7]
#  wkday     term   estimate std.error statistic p.value
#  <chr>     <chr>     <dbl>     <dbl>     <dbl>   <dbl>
#1 Friday    female  0.0226     0.0219     1.03   0.303 
#2 Monday    female  0.0126     0.0231     0.547  0.584 
#3 Saturday  female -0.0234     0.0231    -1.01   0.312 
#4 Sunday    female  0.00857    0.0221     0.388  0.698 
#5 Thursday  female  0.0326     0.0222     1.47   0.143 
#6 Tuesday   female -0.0355     0.0205    -1.73   0.0841
#7 Wednesday female -0.0174     0.0219    -0.797  0.426 

This is simpler, there is no need to create tmp , it can be created in the pipe.这个比较简单,不需要创建tmp ,可以在 pipe 中创建。 To filter out the intercepts is left optional.过滤掉拦截是可选的。

dta%>%
  bind_cols(
    model.matrix(~0 + wkday, dta) %>% as.data.frame
  ) %>%
  select(-wkday) %>% 
  pivot_longer(
    cols = -c(id, female),
    names_to = "wkday",
    values_to = "dummy"
  ) %>% 
  mutate(wkday = sub("^wkday", "", wkday)) %>% 
  group_by(wkday) %>%
  do(tidy(lm(dummy ~ female, data = .)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM