简体   繁体   English

使用plm(具有固定效果)的R中的聚类标准错误

[英]Clustered standard errors in R using plm (with fixed effects)

I'm trying to run a regression in R's plm package with fixed effects and model = 'within' , while having clustered standard errors. 我试图在R的plm包中运行带有固定效果和model = 'within'的回归,同时具有聚类标准错误。 Using the Cigar dataset from plm , I'm running: 使用plmCigar数据集,我正在运行:

require(plm)
require(lmtest)
data(Cigar)
model <- plm(price ~ sales + factor(state), model = 'within', data = Cigar)
coeftest(model, vcovHC(model, type = 'HC0', cluster = 'group'))

  Estimate Std. Error t value Pr(>|t|)    
sales  -1.21956    0.21136 -5.7701 9.84e-09

This is (slightly) different than what I'd get by using Stata (having written the Cigar file as a .dta): 这与我使用Stata(将Cigar文件编写为.dta)得到的结果略有不同:

use cigar

xtset state year

xtreg price sales, fe vce(cluster state)


price   Coef.   Std. Err.   t   P>t [95% Conf.  Interval]

sales   -1.219563   .2137726    -5.70   0.000   -1.650124   -.7890033

Namely, the standard error and T statistic are different. 即,标准误差和T统计量是不同的。 I've tried rerunning the R code with different "types", but none give the same result as Stata. 我尝试用不同的“类型”重新运行R代码,但没有一个给出与Stata相同的结果。 Am I missing something? 我错过了什么吗?

Stata uses a finite sample correction to reduce downwards bias in the errors due to the finite number of clusters. Stata使用有限样本校正来减少由于有限数量的簇而导致的误差的向下偏差 It is a multiplicative factor on the variance-covariance matrix, $c=\\frac{G}{G-1} \\cdot \\frac{N-1}{NK}$, where G is the number of groups, N is the number of observations, and K is the number of parameters. 它是方差 - 协方差矩阵的乘法因子,$ c = \\ frac {G} {G-1} \\ cdot \\ frac {N-1} {NK} $,其中G是群组的数量,N是观察次数,K是参数的数量。 I think coeftest only uses $c'=\\frac{N-1}{NK}$ since if I scale R's standard error by the square of the first term in c, I get something pretty close to Stata's standard error: 我认为coeftest只使用$ c'= \\ frac {N-1} {NK} $,因为如果我将R的标准误差按c中第一项的平方缩放,我得到的东西非常接近Stata的标准误差:

display 0.21136*(46/(46-1))^(.5)
.21369554

Here's how I would replicate what Stata is doing in R: 以下是我将如何复制Stata在R中所做的事情:

require(plm)
require(lmtest)
data(Cigar)
model <- plm(price ~ sales, model = 'within', data = Cigar)
G <- length(unique(Cigar$state))
c <- G/(G - 1)
coeftest(model,c * vcovHC(model, type = "HC1", cluster = "group"))

This yields: 这会产生:

t test of coefficients:

       Estimate Std. Error  t value   Pr(>|t|)    
sales -1.219563   0.213773 -5.70496 1.4319e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

which agrees with Stata's error of 0.2137726 and t-stat of -5.70. 这与Stata的错误0.2137726和t-stat为-5.70一致。

This code is probably not ideal, since the number of states in the data may be different than the number of states in the regression, but I am too lazy to figure out how to get the right number of panels. 这段代码可能并不理想,因为数据中的状态数可能与回归中的状态数不同,但我太懒了,无法弄清楚如何获得正确数量的面板。

Stata uses a specific small-sample correction that has been implemented in plm 1.5. Stata使用已在plm 1.5中实现的特定小样本校正。

Try this: 尝试这个:

require(plm)
require(lmtest)
data(Cigar)
model <- plm(price ~ sales + factor(state), model = 'within', data = Cigar)
coeftest(model, function(x) vcovHC(x, type = 'sss'))

Which will yield: 哪个会产生:

t test of coefficients:

      Estimate Std. Error t value  Pr(>|t|)    
sales  -1.2196     0.2137  -5.707 1.415e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This gives the same SE estimate up to 3 digits: 这给出了相同的SE估计,最多3位数:

x <- coeftest(model, function(x) vcovHC(x, type = 'sss'))
x[ , "Std. Error"]
## [1] 0.2136951

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM