简体   繁体   中英

Running linear models for groups within dataframe and storing outputs in dataframe in R

I am trying to run multiple linear models for a very large dataset and store the outputs in a dataframe. I have managed to get estimates and p-values into dataframe (see below) but I also want to store the AIC for each model.

#example dataframe

dt = data.frame(x = rnorm(40, 5, 5),
                y = rnorm(40, 3, 4),
                group = rep(c("a","b"), 20))

library(dplyr)
library(broom)

# code that runs lm for each group in row z and stores output 
dt_lm <- dt %>%
  group_by(group) %>%  
  do(tidy(lm(y~x, data=.)))

Use glance instead of tidy :

dt_lm <- dt %>%
  group_by(group) %>%
  do(glance(lm(y~x, data=.))) %>%
  select(AIC)

which gives:

Adding missing grouping variables: `group`
# A tibble: 2 x 2
# Groups:   group [2]
  group   AIC
  <chr> <dbl>
1 a      119.
2 b      114.

If you not only want to store the AIC but other metrics just skip the select part.

In the newer version of dplyr ie >= 1.0 , we can also use nest_by

library(dplyr)
library(tidyr)
library(broom)
dt %>% 
     nest_by(group) %>%
     transmute(out = list(glance(lm(y ~ x, data = data))))  %>% 
     unnest(c(out)) %>% 
     select(AIC)
# A tibble: 2 x 2
# Groups:   group [2]
#  group   AIC
#  <chr> <dbl>
#1 a      115.
#2 b      100.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM