I have a large df that follows the structure below. For each of 200 groups, I want to fit a linear model to 30 years of data, then extract the slope and the R squared.
count <- c(5, 10, 15, 20, 2, 4, 6, 8, 1, 2, 3, 4, 10, 20, 30, 40)
group <- c(0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
year <- c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003)
df <- data.frame(count, group, year)
I am able to get the data I want for one group at a time like this
group_1 <- filter(df, group == 1)
lm_group1 <- lm(count ~ year, data = group_1)
lm_coef_1 <- lm_group_1$coefficients["year"]
lm_rsq_1 <- summary(lm_group_1)$adj.r.squared
df_group_1 <- data.frame(lm_coef_1, lm_rsq_1)
Doing this 200 times via copy-paste is not sensible, so I am trying to automate with a for loop.
for (i in df$group) {
group_data <- filter(df, group == i)
lm_g <- lm(count ~ year, data = group_data)
lm_coef <- lm_g$coefficients["year"]
lm_rsq <- summary(lm_g)$adj.r.squared
df_i <- data.frame(lm_coef, lm_rsq)
if (i == 199)
break
}
This runs and raises no errors. However, instead of producing 200 dfs, one for each group, it produces one single df, df_i.
I have tried naming all the variables with i (lm_i, lm_coef_i, lm_rsq_i, df_i); have tried inserting %>% between statements; have tried looking for examples of similar problems. Have tried to apply the tidy function from the broom package, but it seems to drop the r squared, or at least I can't find where it puts it.
Primarily, I want to know why this loop refuses to iterate. However, I am open to and appreciate other suggestions for how to solve this problem more elegantly.
Just putting i
after a variable name won't automatically create a new object. I think what you want to do is make a list with 200 indices containing the individual data frames that you want.
#make a list the size of your expected output
my_df_list <- vector("list", 200)
for (i in df$group) {
group_data <- filter(df, group == i)
lm_g <- lm(count ~ year, data = group_data)
lm_coef <- lm_g$coefficients["year"]
lm_rsq <- summary(lm_g)$adj.r.squared
my_df_list[[i]] <- data.frame(lm_coef, lm_rsq)
if (i == 199)
break
}
This way you can keep each of your data.frames separate while iterating through.
You could also do:
library(tidyverse)
df %>%
group_by(group) %>%
summarise(d = list(tibble(coef_year = coef(a<-lm(count~year, cur_data()))['year'],
adjusted_r_sq = summary(a)$adj.r.sq)))
We can use broom
library(dplyr)
library(broom)
library(tidyr)
df %>%
group_by(group) %>%
summarise(out = list(glance(lm(count ~ year, data = cur_data())) %>%
select(r.squared, adj.r.squared))) %>%
unnest(out)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.