简体   繁体   English

plm:使用 fixef() 手动计算固定效应双向模型的拟合值

[英]plm: using fixef() to manually calculate fitted values for a fixed effects twoways model

Please note: I am trying to get the code to work with both time & individual fixed effects, and an unbalanced dataset.请注意:我试图让代码同时处理时间和个人固定效应以及不平衡的数据集。 The sample code below works with a balanced dataset.下面的示例代码适用于平衡数据集。

See edit below too, please也请参阅下面的编辑

I am trying to manually calculate the fitted values of a fixed effects model (with both individual and time effects) using the plm package.我正在尝试使用plm包手动计算固定效应模型(具有个体效应和时间效应)的拟合值。 This is more of an exercise to confirm I understand the mechanics of the model and the package, I know I can get the fitted values themselves from the plm object, from the two related questions ( here and here ).这更像是一个练习,以确认我了解模型和包的机制,我知道我可以从plm对象,从两个相关问题( 此处此处)中获取拟合值本身。

From the plm vignette (p.2), the underlying model is:plm小插图 (p.2),底层模型是:

y _it = alpha + beta _transposed * x _it + ( mu _i + lambda _t + epsilon _it) y _it = alpha + beta _transposed * x _it + ( mu _i + lambda _t + epsilon _it)

where mu_i is the individual component of the error term (aka "individual effect"), and lambda_t is the "time effect".其中 mu_i 是误差项的单个分量(又名“个体效应”),而 lambda_t 是“时间效应”。

The fixed effects can be extracted by using fixef() and I thought I could use them (together with the independent variables) to calculate the fitted values for the model, using (with two independent variables) in this way:可以通过使用fixef()提取固定效应,我想我可以使用它们(与自变量一起)来计算模型的拟合值,使用(使用两个自变量)以这种方式:

fit _it = alpha + beta _1 * x1 + beta _2 * x2 + mu _i + lambda _t拟合_it = alpha + beta _1 * x1 + beta _2 * x2 + mu _i + lambda _t

This is where I fail -- the values I get are nowhere near the fitted values (which I get as the difference between the actual values and the residuals in the model object).这就是我失败的地方——我得到的值远不及拟合值(我得到的是模型对象中实际值和残差之间的差异)。 For one, I do not see alpha anywhere.一方面,我在任何地方都看不到alpha I tried playing with the fixed effects being shown as differences from the first, from the mean, etc., with no success.我尝试将固定效果显示为与第一个、平均值等的差异,但没有成功。

What I am missing?我缺少什么? It could well be a misunderstanding of the model, or an error in the code, I am afraid... Thanks in advance.这很可能是对模型的误解,或者是代码中的错误,恐怕……提前致谢。

PS: One of the related questions hints that pmodel.response() should be related to my issue (and the reason there is no plm.fit function), but its help page does not help me understand what this function actually does, and I cannot find any examples how to interpret the result it produces. PS:其中一个相关问题暗示pmodel.response()应该与我的问题有关(以及没有plm.fit函数的原因),但它的帮助页面并不能帮助我理解该函数的实际作用,我找不到任何示例来解释它产生的结果。

Thanks!谢谢!

Sample code of what I did:我所做的示例代码:

library(data.table); library(plm)

set.seed(100)
DT <- data.table(CJ(id=c("a","b","c","d"), time=c(1:10)))
DT[, x1:=rnorm(40)]
DT[, x2:=rnorm(40)]
DT[, y:=x1 + 2*x2 + rnorm(40)/10]
DT <- DT[!(id=="a" & time==4)] # just to make it an unbalanced panel
setkey(DT, id, time)

summary(plmFEit <- plm(data=DT, id=c("id","time"), formula=y ~ x1 + x2, model="within", effect="twoways"))

# Extract the fitted values from the plm object
FV <- data.table(plmFEit$model, residuals=as.numeric(plmFEit$residuals))
FV[, y := as.numeric(y)]
FV[, x1 := as.numeric(x1)]
FV[, x2 := as.numeric(x2)]

DT <- merge(x=DT, y=FV, by=c("y","x1","x2"), all=TRUE)
DT[, fitted.plm := as.numeric(y) - as.numeric(residuals)]

FEI <- data.table(as.matrix(fixef(object=plmFEit, effect="individual", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FEI, c("id","fei"))
setkey(FEI, id)
setkey(DT, id)
DT <- DT[FEI] # merge the fei into the data, each id gets a single number for every row

FET <- data.table(as.matrix(fixef(object=plmFEit, effect="time", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FET, c("time","fet"))
FET[, time := as.integer(time)] # fixef returns time as character
setkey(FET, time)
setkey(DT, time)
DT <- DT[FET] # merge the fet into the data, each time gets a single number for every row

# calculate the fitted values (called calc to distinguish from those from plm)
DT[, fitted.calc := as.numeric(coef(plmFEit)[1] * x1 + coef(plmFEit)[2]*x2 + fei + fet)]
DT[, diff := as.numeric(fitted.plm - fitted.calc)]

all.equal(DT$fitted.plm, DT$fitted.calc)

My session is as follows:我的会话如下:

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plm_1.4-0           Formula_1.2-1       RJSONIO_1.3-0       jsonlite_0.9.17     readxl_0.1.0.9000   data.table_1.9.7    bit64_0.9-5         bit_1.1-12          RevoUtilsMath_3.2.2

loaded via a namespace (and not attached):
 [1] bdsmatrix_1.3-2  Rcpp_0.12.1      lattice_0.20-33  zoo_1.7-12       MASS_7.3-44      grid_3.2.2       chron_2.3-47     nlme_3.1-122     curl_0.9.3       rstudioapi_0.3.1 sandwich_2.3-4  
[12] tools_3.2.2  

Edit: (2015-02-22) Since this has attracted some interest, I will try to clarify further.编辑:(2015-02-22)由于这引起了一些兴趣,我将尝试进一步澄清。 I was trying to fit a "fixed effects" model (aka "within" or "least squares dummy variables", as the plm package vignette calls it on p.3, top paragraph) -- same slope(s), different intercepts.我试图拟合一个“固定效应”模型(又名“内部”或“最小二乘虚拟变量”,正如plm 包小插图在第 3 页上的段落中所称的那样)——相同的斜率,不同的截距。

This is the same as running an ordinary OLS regression after adding dummies for time and id .这与在为timeid添加虚拟变量后运行普通 OLS 回归相同。 Using the code below I can duplicate the fitted values from the plm package using base lm() .使用下面的代码,我可以使用 base lm()plm包中复制拟合值。 With the dummies, it is explicit that the first elements of both id and time are the group to compare to.对于假人,很明显 id 和 time 的第一个元素是要比较的组。 What I still cannot do is how to use the facilities of the plm package to do the same I can easily accomplish using lm() .我仍然不能做的是如何使用plm包的功能来做同样的事情,我可以使用lm()轻松完成。

# fit the same with lm() and match the fitted values to those from plm()
lmF <- lm(data = DT, formula = y ~ x1 + x2 + factor(time) + factor(id))
time.lm <- coef(lmF)[grep(x = names(coef(lmF)), pattern = "time", fixed = TRUE)]
time.lm <- c(0, unname(time.lm)) # no need for names, the position index corresponds to time

id.lm <- coef(lmF)[grep(x = names(coef(lmF)), pattern = "id", fixed = TRUE)]
id.lm <- c(0, unname(id.lm))
names(id.lm) <- c("a","b","c","d") # set names so that individual values can be looked up below when generating the fit

DT[, by=list(id, time), fitted.lm := coef(lmF)[["(Intercept)"]]  +  coef(lmF)[["x1"]] * x1  +  coef(lmF)[["x2"]] * x2  +  time.lm[[time]]  +  id.lm[[id]]]
all.equal(DT$fitted.plm, DT$fitted.lm)

Hope this is useful to others who might be interested.希望这对其他可能感兴趣的人有用。 The issue might be something about how plm and fixef deal with the missing value I intentionally created.问题可能与plmfixef如何处理我有意创建的缺失值有关。 I tried playing with the type= parameter of fixef but to no effect.我尝试使用fixeftype=参数,但没有效果。

这适用于具有effect="individual"和时间假人y ~ x +factor(year)的不平衡数据:

fitted <- pmodel.response(plm.model)-residuals(plm.model)

I found this that can help you, since the lm() solution was not working in my case (I got different coefficients comparing to the plm package)我发现这可以帮助你,因为 lm() 解决方案在我的情况下不起作用(与 plm 包相比,我得到了不同的系数)

Therefore, it is just about applying the suggestions by the authors of the plm package here http://r.789695.n4.nabble.com/fitted-from-plm-td3003924.html因此,这只是应用 plm 包作者的建议http://r.789695.n4.nabble.com/fitted-from-plm-td3003924.html

So what I did is just to apply所以我所做的只是申请

plm.object <- plm(y ~ lag(y, 1) + z +z2, data = mdt, model= "within", effect="twoways")
fitted <- as.numeric(plm.object$model[[1]] - plm.object$residuals) 

where I need the as.numeric function since I need to use it as a vector to plug in for further manipulations.我需要 as.numeric 函数的地方,因为我需要将它用作向量来插入以进行进一步的操作。 I also want to point out that if your model has a lagged dependent variable on the right hand side, the solution above with as.numeric provides a vector already NET of the missing values because of the lag.我还想指出,如果您的模型在右侧有一个滞后的因变量,那么上面的 as.numeric 解决方案提供了一个向量,由于滞后,已经包含缺失值的 NET。 For me this is exactly what I needed to.对我来说,这正是我需要的。

I'm getting pretty close with Helix123's suggestion to subtract the within_intercept (it gets included in each of the two fixed effects, so you need to correct for that).我非常接近 Helix123 的建议,即减去within_intercept (它包含在两个固定效果中的每一个中,因此您需要对此进行更正)。

There's a very suggestive pattern in my reconstruction errors: individual a is always off by -0.004858712 (for every time period).我的重建错误中有一个非常具有启发性的模式:个体a总是偏离 -0.004858712(对于每个时间段)。 Individuals b, c, d are always off by 0.002839703 for every time period except in period 4 (where there is no observation for a ), where they're off by -0.010981192.个体b, c, d在每个时间段总是偏离 0.002839703,除了在第 4 期(没有观察a ),它们偏离 -0.010981192。

Any ideas?有什么想法吗? It looks like the individual fixed effects are thrown off by unbalancing.看起来个体的固定效应被不平衡所抛弃。 Rerunning it balanced works correctly.重新运行它平衡工作正常。

Full code:完整代码:

DT <- data.table(CJ(id=c("a","b","c","d"), time=c(1:10)))
DT[, x1:=rnorm(40)]
DT[, x2:=rnorm(40)]
DT[, y:= x1 + 2*x2 + rnorm(40)/10]
DT <- DT[!(id=="a" & time==4)] # just to make it an unbalanced panel
setkey(DT, id, time)

plmFEit <- plm(formula=y ~ x1 + x2,
               data=DT,
               index=c("id","time"),
               effect="twoways",
               model="within")

summary(plmFEit)

DT[, resids := residuals(plmFEit)]

FEI <- data.table(as.matrix(fixef(plmFEit, effect="individual", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FEI, c("id","fei"))
setkey(FEI, id)
setkey(DT, id)
DT <- DT[FEI] # merge the fei into the data, each id gets a single number for every row

FET <- data.table(as.matrix(fixef(plmFEit, effect="time", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FET, c("time","fet"))
FET[, time := as.integer(time)] # fixef returns time as character
setkey(FET, time)
setkey(DT, time)
DT <- DT[FET] # merge the fet into the data, each time gets a single number for every row

DT[, fitted.calc := plmFEit$coefficients[[1]] * x1 + plmFEit$coefficients[[2]] * x2 +
     fei + fet - within_intercept(plmFEit)]

DT[, myresids := y - fitted.calc]
DT[, myerr := resids - myresids]

Edit: adapted to two-ways unbalanced model, needs plm version >= 2.4-0编辑:适应双向不平衡模型,需要plm版本>= 2.4-0

Is this what you wanted?这是你想要的吗? Extract the fixed effects by fixef .通过fixef提取固定效应。 Here is an example for the Grunfeld data on an unbalanced two-way model (works the same for the balanced two-way model):以下是非平衡双向模型上的 Grunfeld 数据示例(对平衡双向模型的工作方式相同):

gtw_u <- plm(inv ~ value + capital, data = Grunfeld[-200, ], effect = "twoways")
yhat <- as.numeric(gtw_u$model[ , 1] - gtw_u$residuals) # reference
pred_beta <- as.numeric(tcrossprod(coef(gtw_u), as.matrix(gtw_u$model[ , -1])))
pred_effs <- as.numeric(fixef(gtw_u, "twoways")) # sum of ind and time effects

all.equal(pred_effs + pred_beta, yhat) # TRUE -> matches fitted values (yhat)

If you want to split the sum of individual and time effects (given by effect = "twoways" ) in its components, you will need to choose a reference and two come naturally to mind which are both given below:如果您想在其组件中拆分个体和时间效果的总和(由effect = "twoways" ),您需要选择一个参考,并且很自然地想到两个,如下所示:

# Splits of summed up individual and time effects:
# use one "level" and one "dfirst"
ii <- index(gtw_u)[[1L]]; it <- index(gtw_u)[[2L]]
eff_id_dfirst <- c(0, as.numeric(fixef(gtw_u, "individual", "dfirst")))[ii]
eff_ti_dfirst <- c(0, as.numeric(fixef(gtw_u, "time",       "dfirst")))[it]
eff_id_level <- as.numeric(fixef(gtw_u, "individual"))[ii]
eff_ti_level <- as.numeric(fixef(gtw_u, "time"))[it]

all.equal(pred_effs, eff_id_level  + eff_ti_dfirst) # TRUE
all.equal(pred_effs, eff_id_dfirst + eff_ti_level)  # TRUE

(This is based on the man page of fixef, ?fixef . There it is also shown how the (balanced and unbalanced) one-way model is to be handled). (这是基于 fixef 的手册页?fixef 。那里还展示了如何处理(平衡和非平衡)单向模型)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM