[英]How to create a regression with multiple variables and fixed effects using lfe::felm?
I have a large data set and want to regress variable X on variables A, B, C, D, and E and also include fixed effects for the year W, Y, Z. I want to use the natural log of the variable C as well (plus one).我有一个大数据集,想对变量 A、B、C、D 和 E 回归变量 X,还包括 W、Y、Z 年的固定效应。我想使用变量 C 的自然对数作为好(加一)。
How can I go about this?我怎么能go一下这个?
My intuition was to use felm我的直觉是使用 felm
# install.packages("lfe")
library(lfe)
regress <- felm(formula= X ~ A, B, C, D, E + W + Y + Z)
regress
Looks to me you have data with year dummies like this:在我看来,你有像这样的年份假人数据:
head(dat, 3)
# id year2020 year2021 year2022 y x1 x2
# 1 1 1 0 0 107.42623 32.903003 298.8692
# 2 2 1 0 0 32.90695 -13.552756 187.4316
# 3 3 1 0 0 123.78364 8.715082 507.0717
You will need to create a factor variable from the year dummies like so:您将需要从年份虚拟变量创建一个因子变量,如下所示:
ycols <- c('year2020', 'year2021', 'year2022')
dat$year <- gsub('year', '', ycols)[apply(dat[ycols], 1, which.max)]
dat <- dat[setdiff(names(dat), ycols)]
head(dat, 3)
# id y x1 x2 year
# 1 1 107.42623 32.903003 298.8692 2020
# 2 2 32.90695 -13.552756 187.4316 2020
# 3 3 123.78364 8.715082 507.0717 2020
Then add fixed effects in the formula this way (see help('felm')
), where you also may use log
with any addition directly.然后以这种方式在公式中添加固定效果(请参阅
help('felm')
),您也可以在其中直接使用log
和任何添加。
library(lfe)
est1 <- felm(y ~ x1 + log(x2 + .001) | id + year, dat)
summary(est1)
# Call:
# felm(formula = y ~ x1 + log(x2 + 0.001) | id + year, data = dat)
#
# Residuals:
# Min 1Q Median 3Q Max
# -104.680 -14.638 -0.788 13.477 150.973
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# x1 1.03560 0.09927 10.43 <2e-16 ***
# log(x2 + 0.001) 71.01762 2.88983 24.57 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 33.47 on 196 degrees of freedom
# Multiple R-squared(full model): 0.8559 Adjusted R-squared: 0.7802
# Multiple R-squared(proj model): 0.7894 Adjusted R-squared: 0.6787
# F-statistic(full model): 11.3 on 103 and 196 DF, p-value: < 2.2e-16
# F-statistic(proj model): 367.3 on 2 and 196 DF, p-value: < 2.2e-16
You may confirm the results with LSDV:你可以用 LSDV 确认结果:
est2 <- lm(y ~ 0 + x1 + log(x2 + 0.001) + factor(id) + factor(year), dat)
summary(est2)$coe[1:2, ] |> signif(5)
# Estimate Std. Error t value Pr(>|t|)
# x1 1.0356 0.099272 10.432 1.5098e-20
# log(x2 + 0.001) 71.0180 2.889800 24.575 9.0691e-62
Data:数据:
nid <- 100; nyr <- 3
set.seed(42)
dat <- expand.grid(id=factor(seq_len(nid)), year=factor(2019+seq_len(nyr)))
dat <- within(dat, {
x1 <- rnorm(nid*nyr, 0, 24)
x2 <- rgamma(nid*nyr, scale=200, shape=2)
y <- x1 + .25*x2 + rnorm(nlevels(id)) + rnorm(nlevels(year)) +
rnorm(nid*nyr, 0, 12)
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.