plm：使用 fixef() 手动计算固定效应双向模型的拟合值

Question

请注意：我试图让代码同时处理时间和个人固定效应以及不平衡的数据集。 下面的示例代码适用于平衡数据集。

也请参阅下面的编辑

我正在尝试使用plm包手动计算固定效应模型（具有个体效应和时间效应）的拟合值。 这更像是一个练习，以确认我了解模型和包的机制，我知道我可以从plm对象，从两个相关问题（此处和此处）中获取拟合值本身。

从plm小插图 (p.2)，底层模型是：

y _it = alpha + beta _transposed * x _it + ( mu _i + lambda _t + epsilon _it)

其中 mu_i 是误差项的单个分量（又名“个体效应”），而 lambda_t 是“时间效应”。

可以通过使用fixef()提取固定效应，我想我可以使用它们（与自变量一起）来计算模型的拟合值，使用（使用两个自变量）以这种方式：

拟合_it = alpha + beta _1 * x1 + beta _2 * x2 + mu _i + lambda _t

这就是我失败的地方——我得到的值远不及拟合值（我得到的是模型对象中实际值和残差之间的差异）。 一方面，我在任何地方都看不到alpha 。 我尝试将固定效果显示为与第一个、平均值等的差异，但没有成功。

我缺少什么？ 这很可能是对模型的误解，或者是代码中的错误，恐怕……提前致谢。

PS：其中一个相关问题暗示pmodel.response()应该与我的问题有关（以及没有plm.fit函数的原因），但它的帮助页面并不能帮助我理解该函数的实际作用，我找不到任何示例来解释它产生的结果。

谢谢！

我所做的示例代码：

library(data.table); library(plm)

set.seed(100)
DT <- data.table(CJ(id=c("a","b","c","d"), time=c(1:10)))
DT[, x1:=rnorm(40)]
DT[, x2:=rnorm(40)]
DT[, y:=x1 + 2*x2 + rnorm(40)/10]
DT <- DT[!(id=="a" & time==4)] # just to make it an unbalanced panel
setkey(DT, id, time)

summary(plmFEit <- plm(data=DT, id=c("id","time"), formula=y ~ x1 + x2, model="within", effect="twoways"))

# Extract the fitted values from the plm object
FV <- data.table(plmFEit$model, residuals=as.numeric(plmFEit$residuals))
FV[, y := as.numeric(y)]
FV[, x1 := as.numeric(x1)]
FV[, x2 := as.numeric(x2)]

DT <- merge(x=DT, y=FV, by=c("y","x1","x2"), all=TRUE)
DT[, fitted.plm := as.numeric(y) - as.numeric(residuals)]

FEI <- data.table(as.matrix(fixef(object=plmFEit, effect="individual", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FEI, c("id","fei"))
setkey(FEI, id)
setkey(DT, id)
DT <- DT[FEI] # merge the fei into the data, each id gets a single number for every row

FET <- data.table(as.matrix(fixef(object=plmFEit, effect="time", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FET, c("time","fet"))
FET[, time := as.integer(time)] # fixef returns time as character
setkey(FET, time)
setkey(DT, time)
DT <- DT[FET] # merge the fet into the data, each time gets a single number for every row

# calculate the fitted values (called calc to distinguish from those from plm)
DT[, fitted.calc := as.numeric(coef(plmFEit)[1] * x1 + coef(plmFEit)[2]*x2 + fei + fet)]
DT[, diff := as.numeric(fitted.plm - fitted.calc)]

all.equal(DT$fitted.plm, DT$fitted.calc)

我的会话如下：

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plm_1.4-0           Formula_1.2-1       RJSONIO_1.3-0       jsonlite_0.9.17     readxl_0.1.0.9000   data.table_1.9.7    bit64_0.9-5         bit_1.1-12          RevoUtilsMath_3.2.2

loaded via a namespace (and not attached):
 [1] bdsmatrix_1.3-2  Rcpp_0.12.1      lattice_0.20-33  zoo_1.7-12       MASS_7.3-44      grid_3.2.2       chron_2.3-47     nlme_3.1-122     curl_0.9.3       rstudioapi_0.3.1 sandwich_2.3-4  
[12] tools_3.2.2

编辑：(2015-02-22)由于这引起了一些兴趣，我将尝试进一步澄清。我试图拟合一个“固定效应”模型（又名“内部”或“最小二乘虚拟变量”，正如plm 包小插图在第 3 页上的段落中所称的那样）——相同的斜率，不同的截距。

这与在为time和id添加虚拟变量后运行普通 OLS 回归相同。 使用下面的代码，我可以使用 base lm()从plm包中复制拟合值。 对于假人，很明显 id 和 time 的第一个元素是要比较的组。 我仍然不能做的是如何使用plm包的功能来做同样的事情，我可以使用lm()轻松完成。

# fit the same with lm() and match the fitted values to those from plm()
lmF <- lm(data = DT, formula = y ~ x1 + x2 + factor(time) + factor(id))
time.lm <- coef(lmF)[grep(x = names(coef(lmF)), pattern = "time", fixed = TRUE)]
time.lm <- c(0, unname(time.lm)) # no need for names, the position index corresponds to time

id.lm <- coef(lmF)[grep(x = names(coef(lmF)), pattern = "id", fixed = TRUE)]
id.lm <- c(0, unname(id.lm))
names(id.lm) <- c("a","b","c","d") # set names so that individual values can be looked up below when generating the fit

DT[, by=list(id, time), fitted.lm := coef(lmF)[["(Intercept)"]]  +  coef(lmF)[["x1"]] * x1  +  coef(lmF)[["x2"]] * x2  +  time.lm[[time]]  +  id.lm[[id]]]
all.equal(DT$fitted.plm, DT$fitted.lm)

希望这对其他可能感兴趣的人有用。 问题可能与plm和fixef如何处理我有意创建的缺失值有关。 我尝试使用fixef的type=参数，但没有效果。

Answer 1

这适用于具有effect="individual"和时间假人y ~ x +factor(year)的不平衡数据：

fitted <- pmodel.response(plm.model)-residuals(plm.model)

Answer 2

我发现这可以帮助你，因为 lm() 解决方案在我的情况下不起作用（与 plm 包相比，我得到了不同的系数）

因此，这只是应用 plm 包作者的建议http://r.789695.n4.nabble.com/fitted-from-plm-td3003924.html

所以我所做的只是申请

plm.object <- plm(y ~ lag(y, 1) + z +z2, data = mdt, model= "within", effect="twoways")
fitted <- as.numeric(plm.object$model[[1]] - plm.object$residuals)

我需要 as.numeric 函数的地方，因为我需要将它用作向量来插入以进行进一步的操作。 我还想指出，如果您的模型在右侧有一个滞后的因变量，那么上面的 as.numeric 解决方案提供了一个向量，由于滞后，已经包含缺失值的 NET。 对我来说，这正是我需要的。

Answer 3

我非常接近 Helix123 的建议，即减去within_intercept （它包含在两个固定效果中的每一个中，因此您需要对此进行更正）。

我的重建错误中有一个非常具有启发性的模式：个体a总是偏离 -0.004858712（对于每个时间段）。 个体b, c, d在每个时间段总是偏离 0.002839703，除了在第 4 期（没有观察a ），它们偏离 -0.010981192。

有什么想法吗？ 看起来个体的固定效应被不平衡所抛弃。 重新运行它平衡工作正常。

完整代码：

DT <- data.table(CJ(id=c("a","b","c","d"), time=c(1:10)))
DT[, x1:=rnorm(40)]
DT[, x2:=rnorm(40)]
DT[, y:= x1 + 2*x2 + rnorm(40)/10]
DT <- DT[!(id=="a" & time==4)] # just to make it an unbalanced panel
setkey(DT, id, time)

plmFEit <- plm(formula=y ~ x1 + x2,
               data=DT,
               index=c("id","time"),
               effect="twoways",
               model="within")

summary(plmFEit)

DT[, resids := residuals(plmFEit)]

FEI <- data.table(as.matrix(fixef(plmFEit, effect="individual", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FEI, c("id","fei"))
setkey(FEI, id)
setkey(DT, id)
DT <- DT[FEI] # merge the fei into the data, each id gets a single number for every row

FET <- data.table(as.matrix(fixef(plmFEit, effect="time", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FET, c("time","fet"))
FET[, time := as.integer(time)] # fixef returns time as character
setkey(FET, time)
setkey(DT, time)
DT <- DT[FET] # merge the fet into the data, each time gets a single number for every row

DT[, fitted.calc := plmFEit$coefficients[[1]] * x1 + plmFEit$coefficients[[2]] * x2 +
     fei + fet - within_intercept(plmFEit)]

DT[, myresids := y - fitted.calc]
DT[, myerr := resids - myresids]

Answer 4

编辑：适应双向不平衡模型，需要plm版本>= 2.4-0

这是你想要的吗？ 通过fixef提取固定效应。 以下是非平衡双向模型上的 Grunfeld 数据示例（对平衡双向模型的工作方式相同）：

gtw_u <- plm(inv ~ value + capital, data = Grunfeld[-200, ], effect = "twoways")
yhat <- as.numeric(gtw_u$model[ , 1] - gtw_u$residuals) # reference
pred_beta <- as.numeric(tcrossprod(coef(gtw_u), as.matrix(gtw_u$model[ , -1])))
pred_effs <- as.numeric(fixef(gtw_u, "twoways")) # sum of ind and time effects

all.equal(pred_effs + pred_beta, yhat) # TRUE -> matches fitted values (yhat)

如果您想在其组件中拆分个体和时间效果的总和（由effect = "twoways" ），您需要选择一个参考，并且很自然地想到两个，如下所示：

# Splits of summed up individual and time effects:
# use one "level" and one "dfirst"
ii <- index(gtw_u)[[1L]]; it <- index(gtw_u)[[2L]]
eff_id_dfirst <- c(0, as.numeric(fixef(gtw_u, "individual", "dfirst")))[ii]
eff_ti_dfirst <- c(0, as.numeric(fixef(gtw_u, "time",       "dfirst")))[it]
eff_id_level <- as.numeric(fixef(gtw_u, "individual"))[ii]
eff_ti_level <- as.numeric(fixef(gtw_u, "time"))[it]

all.equal(pred_effs, eff_id_level  + eff_ti_dfirst) # TRUE
all.equal(pred_effs, eff_id_dfirst + eff_ti_level)  # TRUE

（这是基于 fixef 的手册页?fixef 。那里还展示了如何处理（平衡和非平衡）单向模型）。

plm：使用 fixef() 手动计算固定效应双向模型的拟合值

问题描述

4 个解决方案

解决方案1
2 2017-10-18 00:03:21

解决方案2
1 2015-04-20 16:08:51

解决方案3
1 2017-12-09 19:16:38

解决方案4
0 已采纳 2015-11-12 14:28:06

plm：使用 fixef() 手动计算固定效应双向模型的拟合值

问题描述

4 个解决方案

解决方案1 2 2017-10-18 00:03:21

解决方案2 1 2015-04-20 16:08:51

解决方案3 1 2017-12-09 19:16:38

解决方案4 0 已采纳 2015-11-12 14:28:06

解决方案1
2 2017-10-18 00:03:21

解决方案2
1 2015-04-20 16:08:51

解决方案3
1 2017-12-09 19:16:38

解决方案4
0 已采纳 2015-11-12 14:28:06