简体   繁体   English

R中的固定效果:plm vs lm + factor()

[英]fixed effects in R: plm vs lm + factor()

I'm trying to run a fixed effects regression model in R. I want to control for heterogeneity in variables C and D (neither are a time variable). 我正在尝试在R中运行固定效果回归模型。我想控制变量C和D(都不是时间变量)中的异质性。

I tried the following two approaches: 我尝试了以下两种方法:

1) Use the plm package: Gives me the following error message 1)使用plm软件包:给我以下错误消息

formula = Y ~ A + B + C + D

reg = plm(formula, data= data, index=c('C','D'), method = 'within')

duplicate couples (time-id)Error in pdim.default(index[[1]], index[[2]]) : 

I also tried creating first a panel using 我也尝试过先使用创建面板

data_p = pdata.frame(data,index=c('C','D'))

But I have repeated observations in both columns. 但是我在这两栏中都重复了观察。

2) Use factor() and lm: works well 2)使用factor()和lm:效果很好

formula = Y ~ A + B + factor(C) + factor(D)
reg = lm(formula, data= data)

What is the difference between the two methods? 两种方法有什么区别? Why is plm not working for me? 为什么plm对我不起作用? is it because one of the indices should be time? 是因为指标之一应该是时间吗?

That error is saying you have repeated id-time pairs formed by variables C and D. 该错误表示您重复了由变量C和D组成的id-time对。

Let's say you have a third variable F which jointly with C keep individuals distinct from other one (or your first dimension, whatever it is). 假设您有第三个变量F,该变量与C共同使个体与另一个变量(或您的第一个维度,无论大小)有所不同。 Then with dplyr you can create a unique indice, say id : 然后,使用dplyr可以创建一个唯一的索引,例如id

data.frame$id <- data.frame %>% group_indices(C, F) 

The the index argument in plm becomes index = c(id, D) . plm中的index参数变为index = c(id, D)

The lm + factor() is a solution just in case you have distinct observations. lm + factor()是一个解决方案,以防万一您有不同的发现。 If this is not the case, it will not properly weights the result within each id, that is, the fixed effect is not properly identified. 如果不是这种情况,将无法在每个ID中正确加权结果,即无法正确识别固定效果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM