简体   繁体   English

for 循环未在 R 中迭代

[英]for loop not iterating in R

I have a large df that follows the structure below.我有一个大的 df 遵循下面的结构。 For each of 200 groups, I want to fit a linear model to 30 years of data, then extract the slope and the R squared.对于 200 组中的每一组,我想将线性 model 拟合到 30 年的数据,然后提取斜率和 R 的平方。

   count <- c(5, 10, 15, 20, 2, 4, 6, 8, 1, 2, 3, 4, 10, 20, 30, 40)
   group <- c(0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
   year <- c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003)
   df <- data.frame(count, group, year)

I am able to get the data I want for one group at a time like this我能够像这样一次获得一组我想要的数据

   group_1 <- filter(df, group == 1)
   lm_group1 <-  lm(count ~ year, data = group_1)
   lm_coef_1 <- lm_group_1$coefficients["year"]
   lm_rsq_1 <-  summary(lm_group_1)$adj.r.squared
   df_group_1 <- data.frame(lm_coef_1, lm_rsq_1)

Doing this 200 times via copy-paste is not sensible, so I am trying to automate with a for loop.通过复制粘贴执行此操作 200 次是不明智的,因此我尝试使用 for 循环实现自动化。

    for (i in df$group) {

         group_data <- filter(df, group == i)

         lm_g <-  lm(count ~ year, data = group_data) 
           lm_coef <- lm_g$coefficients["year"] 
           lm_rsq <- summary(lm_g)$adj.r.squared  
           df_i <- data.frame(lm_coef, lm_rsq)

         if (i == 199)
         break
    }

This runs and raises no errors.这会运行并且不会引发任何错误。 However, instead of producing 200 dfs, one for each group, it produces one single df, df_i.但是,它不是生成 200 个 dfs,每组一个,而是生成一个单独的 df,df_i。

I have tried naming all the variables with i (lm_i, lm_coef_i, lm_rsq_i, df_i);我尝试用 i (lm_i, lm_coef_i, lm_rsq_i, df_i) 命名所有变量; have tried inserting %>% between statements;已尝试在语句之间插入 %>%; have tried looking for examples of similar problems.已尝试寻找类似问题的示例。 Have tried to apply the tidy function from the broom package, but it seems to drop the r squared, or at least I can't find where it puts it.试图从扫帚 package 中应用整洁的 function,但它似乎放弃了 r 平方,或者至少我找不到它。

Primarily, I want to know why this loop refuses to iterate.首先,我想知道为什么这个循环拒绝迭代。 However, I am open to and appreciate other suggestions for how to solve this problem more elegantly.但是,我愿意接受并欣赏其他有关如何更优雅地解决此问题的建议。

Just putting i after a variable name won't automatically create a new object.只是在变量名后面加上i不会自动创建一个新的 object。 I think what you want to do is make a list with 200 indices containing the individual data frames that you want.我认为您要做的是制作一个包含 200 个索引的列表,其中包含您想要的各个数据框。

#make a list the size of your expected output

my_df_list <- vector("list", 200)

 for (i in df$group) {

         group_data <- filter(df, group == i)

         lm_g <-  lm(count ~ year, data = group_data) 
           lm_coef <- lm_g$coefficients["year"] 
           lm_rsq <- summary(lm_g)$adj.r.squared  
           my_df_list[[i]] <- data.frame(lm_coef, lm_rsq)

         if (i == 199)
         break
    }

This way you can keep each of your data.frames separate while iterating through.这样,您可以在迭代时将每个 data.frames 分开。

You could also do:你也可以这样做:

library(tidyverse)

df %>%
  group_by(group) %>%
  summarise(d = list(tibble(coef_year = coef(a<-lm(count~year, cur_data()))['year'],
                          adjusted_r_sq = summary(a)$adj.r.sq)))

We can use broom我们可以使用broom

library(dplyr)
library(broom)
library(tidyr)
df %>% 
  group_by(group) %>%
  summarise(out = list(glance(lm(count ~ year, data = cur_data())) %>% 
               select(r.squared, adj.r.squared))) %>%
 unnest(out)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM