简体   繁体   English

如何执行按 2 个因素分组的多重回归并创建包含 N 和 R 平方的文件?

[英]How to perform multiple regressions grouped by 2 factors and create a file containing N and R-squared?

I am running into some problems again and hope that someone can help me.我又遇到了一些问题,希望有人能帮助我。 I am doing research on the effect of ELI on ROS for firms and if the pandemic has an effect on this.我正在研究 ELI 对公司 ROS 的影响,以及大流行是否对此产生影响。 For this research, my supervisor for my thesis has asked me to do a regression analysis per year grouped by industries (NAICS) and I am at a loss as to how to do this.对于这项研究,我的论文导师要求我每年按行业 (NAICS) 进行回归分析,我不知道如何做到这一点。 I have firms in 46 different industries (NAICS) and 11 years of firm data per firm (2010-2020).我在 46 个不同行业 (NAICS) 拥有公司,每家公司拥有 11 年的公司数据(2010-2020 年)。 Now I would like to run a regression ROS ~ ELI + ELI*Pandemic , for all industries for each year and then capture the resulting N (number of firms per industry) and R-squared in one file.现在我想对每年的所有行业运行回归ROS ~ ELI + ELI*Pandemic ,然后在一个文件中捕获结果 N(每个行业的公司数量)和 R 平方。 The image below is an example of what I am trying to achieve:下图是我想要实现的示例:

在此处输入图像描述

I hope that someone can help me because I am at an absolute loss and I can't seem to find a similar question/answer on SO.我希望有人可以帮助我,因为我完全不知所措,而且我似乎无法在 SO 上找到类似的问题/答案。

Here is the dput(head()) as an example.这里以dput(head())为例。 NAICS is the industry. NAICS 是行业。

df <- structure(list(NAICS = c(315, 315, 315, 315, 315, 315), 
        Year = c(2010, 2011, 2012, 2013, 2014, 2015), 
        Firm = c("A", "A", "A", "A", "A", "A"), 
        ROS = c(0.17, 0.19, 0.29, 0.3, 0.29, 0.25), 
        ELI = c(0.856264428748774, 0.723379402777553, 0.958341156943977, 0.680567730897854, 0.790480861209701, 0.827279134948296), 
        Pandemic = c(0, 0, 0, 0, 0, 0)), 
        row.names = c(NA, -6L), 
        class = c("tbl_df", "tbl", "data.frame"))

Update02更新02

I have made the necessary modifications on my solution after I received the original data set and I don't there will be any other problems.在收到原始数据集后,我已经对我的解决方案进行了必要的修改,我不会有任何其他问题。

library(dplyr)
library(tidyr)
library(broom)
library(purrr)


df %>% 
  group_by(NAICS, Year) %>% 
  add_count(name = "N") %>%
  nest(data = !c(NAICS, Year, N)) %>% 
  mutate(models = map(data, ~ lm(ROS ~ ELI + ELI * Pandemic, data = .)),
         glance = map(models, ~ glance(.x)),
         tidied = map(models, ~ tidy(.x))) %>%
  unnest(glance) %>%
  select(NAICS:N, r.squared, tidied) %>%
  unnest(tidied)


# A tibble: 2,024 x 9
# Groups:   NAICS, Year [506]
   NAICS  Year     N r.squared term         estimate std.error statistic     p.value
   <dbl> <dbl> <int>     <dbl> <chr>           <dbl>     <dbl>     <dbl>       <dbl>
 1   315  2010    12     0.122 (Intercept)    0.0959    0.0123     7.83   0.0000143 
 2   315  2010    12     0.122 ELI            0.0189    0.0160     1.18   0.266     
 3   315  2010    12     0.122 Pandemic      NA        NA         NA     NA         
 4   315  2010    12     0.122 ELI:Pandemic  NA        NA         NA     NA         
 5   315  2011    12     0.129 (Intercept)    0.0999    0.0115     8.70   0.00000559
 6   315  2011    12     0.129 ELI            0.0161    0.0132     1.22   0.251     
 7   315  2011    12     0.129 Pandemic      NA        NA         NA     NA         
 8   315  2011    12     0.129 ELI:Pandemic  NA        NA         NA     NA         
 9   315  2012    13     0.594 (Intercept)   -0.486     0.606     -0.802  0.439     
10   315  2012    13     0.594 ELI            2.11      0.526      4.01   0.00205   
# ... with 2,014 more rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM