使用一個拆分回歸而不是循環從 100 個回歸中提取系數？

Question

我需要運行 600 多個回歸，每個回歸都基於不同的 MECE 數據組（組取值 {1,2,...,623}）。 從每個回歸中，我需要存儲所有自變量的系數估計值。 我能夠通過循環回歸來做到這一點（見下文）； 但是，我發現這很慢，我相信有更好的方法：

# loop prep
formula <- "dv ~ iv_1 + iv_2 + iv_3 | fe"
ols_stored_coef <- matrix(0, 623, 3)
ols_stored_coef <- as.data.frame(ols_stored_coef )

# loop
for(i in 1:623) {
 #run regression:
 ols <- feols(as.formula(formula), subset(df, group==i))
 # generate coefficients:
 ols_coef <- summary(ols)$coefficients
 ols_coef <- data.frame(as.list(ols_coef))
 # store coefficients:
 ols_stored_coef[i,1] = ols_coef[1,1]
 ols_stored_coef[i,2] = ols_coef[1,2]
 ols_stored_coef[i,3] = ols_coef[1,3]
}

這行得通，但運行大約需要 10 分鍾（大約有 600 萬個觀測值和 623 個 MECE 組）。 但是，我知道以下命令在大約 1 分鍾內估計了所有 623 個回歸：

ols_split <- feols(as.formula(formula), df, split=~group)

回歸數據全部存儲在一個“623 列表”中。 我可以通過以下方法提取每組的系數，其中 X 是組值。

ols_split $`sample.var: store; sample: X`$coefficients

在理想情況下，我可以運行這個 split feols()，然后通過循環存儲系數：

for(i in 1:623) {
  ols_coef <- ols_split $`sample.var: store; sample: i`$coefficients
  ols_coef <- data.frame(as.list(ols_coef))
  # store coefficients:
  ols_stored_coef[i,1] = ols_coef[1,1]
  ols_stored_coef[i,2] = ols_coef[1,2]
  ols_stored_coef[i,3] = ols_coef[1,3]
}

但是，因為我在引號中``我相信它被當作文本閱讀因此不起作用。

有什么方法可以使用 ols_split 623 回歸結果列表來提取系數嗎？

Answer 1

我會建議另一種方式，使用 tidyverse 工具。 我正在使用 gapminder 數據集來實現可重復性。 如果您有可以作為示例提供的數據，它也可以應用於此：

library(gapminder)
library(purrr)
library(broom)
library(dplyr)
library(tidyr)

gapminder |> 
  group_by(country) |> 
  nest() |> 
  mutate(fit = map(data, ~ lm(lifeExp ~ gdpPercap, data = .)),
         coefs = map(fit, tidy)) |> 
  unnest(coefs)

# A tibble: 284 × 8
# Groups:   country [142]
   country     data              fit    term        estimate std.error statistic      p.value
   <fct>       <list>            <list> <chr>          <dbl>     <dbl>     <dbl>        <dbl>
 1 Afghanistan <tibble [12 × 5]> <lm>   (Intercept) 39.3     12.0          3.26  0.00857     
 2 Afghanistan <tibble [12 × 5]> <lm>   gdpPercap   -0.00224  0.0149      -0.151 0.883       
 3 Albania     <tibble [12 × 5]> <lm>   (Intercept) 54.0      3.16        17.1   0.0000000101
 4 Albania     <tibble [12 × 5]> <lm>   gdpPercap    0.00444  0.000917     4.84  0.000682    
 5 Algeria     <tibble [12 × 5]> <lm>   (Intercept) 27.4      4.90         5.60  0.000226    
 6 Algeria     <tibble [12 × 5]> <lm>   gdpPercap    0.00714  0.00106      6.71  0.0000533   
 7 Angola      <tibble [12 × 5]> <lm>   (Intercept) 41.6      3.91        10.6   0.000000899 
 8 Angola      <tibble [12 × 5]> <lm>   gdpPercap   -0.00103  0.00104     -0.998 0.342       
 9 Argentina   <tibble [12 × 5]> <lm>   (Intercept) 52.3      3.60        14.5   0.0000000479
10 Argentina   <tibble [12 × 5]> <lm>   gdpPercap    0.00187  0.000395     4.74  0.000797    
# … with 274 more rows
# ℹ Use `print(n = ...)` to see more rows

通過使用nest()和map() （基本版本是lapply() ），您可以遍歷要分組的每個變量以適應 model 並使用broom::tidy()提取系數和其他信息。

Answer 2

您使用的最固定的 package 具有一些內置函數來支持此功能。 這是我基於您的示例：

df <- tibble(
  dv = rnorm(1000),
  iv_1 = rnorm(1000),
  iv_2 = rnorm(1000),
  iv_3 = rnorm(1000),
  fe   = 1,
  group = sample(LETTERS, 1000, replace  = TRUE)
)

formula <- "dv ~ iv_1 + iv_2 + iv_3 | fe"
ols_stored_coef <- matrix(0, 623, 3)
ols_stored_coef <- as.data.frame(ols_stored_coef )
ols_split <- fixest::feols(as.formula(formula), df, split=~group)

out <- fixest::coeftable(ols_split) 
head(out)

  id sample.var sample coefficient    Estimate Std. Error    t value  Pr(>|t|)
1  1      group      A        iv_1 -0.04816492  0.2019670 -0.2384791 0.8133102
2  1      group      A        iv_2 -0.18081949  0.1982410 -0.9121193 0.3697786
3  1      group      A        iv_3  0.04826683  0.1961902  0.2460206 0.8075269
4  2      group      B        iv_1 -0.15561382  0.1824392 -0.8529625 0.3993197
5  2      group      B        iv_2  0.06064802  0.2348541  0.2582370 0.7976946
6  2      group      B        iv_3 -0.07948869  0.1981408 -0.4011728 0.6906643

當然，如果這種格式不是您想要的，並且您確實想要一個微不足道的矩陣，需要從這里進行一些爭論。 IE

m <- matrix(out$Estimate, ncol = length(unique(out$coefficient)), byrow = TRUE)
colnames(m) <- unique(out$coefficient)
rownames(m) <- unique(out$sample)

head(m)

使用一個拆分回歸而不是循環從 100 個回歸中提取系數？

問題描述

2 個解決方案

解決方案1
2 2022-11-16 02:00:19

解決方案2
2 已采納 2022-11-16 02:00:46

使用一個拆分回歸而不是循環從 100 個回歸中提取系數？

問題描述

2 個解決方案

解決方案1 2 2022-11-16 02:00:19

解決方案2 2 已采納 2022-11-16 02:00:46

解決方案1
2 2022-11-16 02:00:19

解決方案2
2 已采納 2022-11-16 02:00:46