简体   繁体   English

如何使用 lfe::felm 创建具有多个变量和固定效应的回归?

[英]How to create a regression with multiple variables and fixed effects using lfe::felm?

I have a large data set and want to regress variable X on variables A, B, C, D, and E and also include fixed effects for the year W, Y, Z. I want to use the natural log of the variable C as well (plus one).我有一个大数据集,想对变量 A、B、C、D 和 E 回归变量 X,还包括 W、Y、Z 年的固定效应。我想使用变量 C 的自然对数作为好(加一)。

How can I go about this?我怎么能go一下这个?

My intuition was to use felm我的直觉是使用 felm

# install.packages("lfe")
library(lfe)
regress <- felm(formula= X ~ A, B, C, D, E + W + Y + Z)
regress

Looks to me you have data with year dummies like this:在我看来,你有像这样的年份假人数据:

head(dat, 3)
#   id year2020 year2021 year2022         y         x1       x2
# 1  1        1        0        0 107.42623  32.903003 298.8692
# 2  2        1        0        0  32.90695 -13.552756 187.4316
# 3  3        1        0        0 123.78364   8.715082 507.0717

You will need to create a factor variable from the year dummies like so:您将需要从年份虚拟变量创建一个因子变量,如下所示:

ycols <- c('year2020', 'year2021', 'year2022')
dat$year <- gsub('year', '', ycols)[apply(dat[ycols], 1, which.max)]
dat <- dat[setdiff(names(dat), ycols)]

head(dat, 3)
#   id         y         x1       x2 year
# 1  1 107.42623  32.903003 298.8692 2020
# 2  2  32.90695 -13.552756 187.4316 2020
# 3  3 123.78364   8.715082 507.0717 2020

Then add fixed effects in the formula this way (see help('felm') ), where you also may use log with any addition directly.然后以这种方式在公式中添加固定效果(请参阅help('felm') ),您也可以在其中直接使用log和任何添加。

library(lfe)

est1 <- felm(y ~ x1 + log(x2 + .001) | id + year, dat)
summary(est1)
# Call:
#   felm(formula = y ~ x1 + log(x2 + 0.001) | id + year, data = dat) 
# 
# Residuals:
#   Min       1Q   Median       3Q      Max 
# -104.680  -14.638   -0.788   13.477  150.973 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)    
# x1               1.03560    0.09927   10.43   <2e-16 ***
# log(x2 + 0.001) 71.01762    2.88983   24.57   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 33.47 on 196 degrees of freedom
# Multiple R-squared(full model): 0.8559   Adjusted R-squared: 0.7802 
# Multiple R-squared(proj model): 0.7894   Adjusted R-squared: 0.6787 
# F-statistic(full model): 11.3 on 103 and 196 DF, p-value: < 2.2e-16 
# F-statistic(proj model): 367.3 on 2 and 196 DF, p-value: < 2.2e-16 

You may confirm the results with LSDV:你可以用 LSDV 确认结果:

est2 <- lm(y ~ 0 + x1 + log(x2 + 0.001) + factor(id) + factor(year), dat)
summary(est2)$coe[1:2, ] |> signif(5)
#                  Estimate Std. Error t value   Pr(>|t|)
# x1                1.0356   0.099272  10.432 1.5098e-20
# log(x2 + 0.001)  71.0180   2.889800  24.575 9.0691e-62

Data:数据:

nid <- 100; nyr <- 3
set.seed(42)
dat <- expand.grid(id=factor(seq_len(nid)), year=factor(2019+seq_len(nyr)))
dat <- within(dat, {
  x1 <- rnorm(nid*nyr, 0, 24)
  x2 <- rgamma(nid*nyr, scale=200, shape=2)
  y <- x1 + .25*x2 + rnorm(nlevels(id)) + rnorm(nlevels(year)) +
    rnorm(nid*nyr, 0, 12)
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM