简体   繁体   English

plm:固定效应回归 - 索引/ID 顺序

[英]plm: Fixed Effects Regression - Index / ID order

I am running fixed effects regressions using the plm package.我正在使用plm包运行固定效应回归。 Why and how does the order of the ID code have an impact on the regression? ID 代码的顺序为什么以及如何对回归产生影响?

I used these codes for running the regressions, which only differ between the order of the ID code Company and Year .我使用这些代码来运行回归,它们仅在 ID 代码CompanyYear的顺序之间有所不同。

The code:编码:

MV_Year <- plm (MVlog ~ LEV + Size + DY + RDlog
                , data=Values, model="within", index= c("Year","Company"))


MV_Company <- plm (MVlog ~ LEV + Size + DY + RDlog,
                   data=Values, model="within", index= c("Company", "Year"))

The corresponding outputs: MV_Year:相应的输出: MV_Year:

Oneway (individual) effect Within Model

Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values, 
    model = "within", index = c("Year", "Company"))

Unbalanced Panel: n = 17, T = 557-4280, N = 29890

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-5.250901 -0.457100  0.015763  0.476140  6.006483 

Coefficients:
         Estimate  Std. Error t-value Pr(>|t|)    
LEV   -1.95485031  0.04060539 -48.143  < 2e-16 ***
Size   0.75233709  0.00314849 238.952  < 2e-16 ***
DY    -0.00033192  0.00013482  -2.462  0.01382 *  
RDlog  0.13148626  0.00300509  43.755  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    102610
Residual Sum of Squares: 17568
R-Squared:      0.82879
Adj. R-Squared: 0.82868
F-statistic: 36148 on 4 and 29869 DF, p-value: < 2.22e-16

MV_Company MV_公司

Oneway (individual) effect Within Model

Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values, 
    model = "within", index = c("Company", "Year"))

Unbalanced Panel: n = 5911, T = 1-17, N = 29890

Residuals:
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-4.35967 -0.38711  0.00000  0.40528  5.48624 

Coefficients:
         Estimate  Std. Error  t-value Pr(>|t|)    
LEV   -1.88958140  0.04392991 -43.0135  < 2e-16 ***
Size   0.74650676  0.00375926 198.5782  < 2e-16 ***
DY    -0.00034308  0.00014585  -2.3524  0.01866 *  
RDlog  0.13904360  0.00331886  41.8950  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    58168
Residual Sum of Squares: 12747
R-Squared:      0.78085
Adj. R-Squared: 0.72679
F-statistic: 21356.2 on 4 and 23975 DF, p-value: < 2.22e-16

Why do the outputs have these small differences between the different estimates and R^2?为什么输出在不同的估计和 R^2 之间有这些小的差异?

The reason for the index= option is that plm() internally uses pdata.frame() which expects the first column to be the "id" and the second column to be the "time" if the according names are not specified by index=(<id>, <time>) index=选项的原因是plm()内部使用pdata.frame() ,如果相应的名称没有由index=(<id>, <time>)指定,它期望第一列是"id" ,第二列是"time" index=(<id>, <time>)

From ?pdata.frame we can read:?pdata.frame我们可以读到:

The index argument indicates the dimensions of the panel. index 参数指示面板的尺寸。 It can be:有可能:

  • a vector of two character strings which contains the names of the individual and of the time indexes,包含个人姓名和时间索引的两个字符串的向量,
  • a character string which is the name of the individual index variable.一个字符串,它是单个索引变量的名称。 In this case, the time index is created automatically and在这种情况下,时间索引是自动创建的,并且
    a new variable called "time" is added, assuming consecutive and添加一个名为“时间”的新变量,假设连续和
    ascending time periods in the order of the original data, ...按原始数据的顺序升序时间段,...

The following example will help us to understand this.下面的例子将帮助我们理解这一点。 First we load the Grunfeld data, which looks like this.首先我们加载Grunfeld数据,它看起来像这样。

library(plm)
data(Grunfeld)
head(Grunfeld, 3)
#   firm year   inv  value capital
# 1    1 1935 317.6 3078.5     2.8
# 2    1 1936 391.8 4661.7    52.6
# 3    1 1937 410.6 5387.1   156.9

First column is the ID, second column is the time.第一列是ID,第二列是时间。 Let's estimate a model.让我们估计一个模型。

summary(plm(inv ~ value + capital, data=Grunfeld,
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

Now, when we confuse the first and the second column,现在,当我们混淆第一列和第二列时,

summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)],
            model="within"))$coe
#          Estimate  Std. Error   t-value     Pr(>|t|)
# value   0.1167978 0.006331302 18.447672 3.586220e-43
# capital 0.2197066 0.032296107  6.802881 1.503653e-10

the result is different.结果是不同的。 But when we tell plm by index=(<id>, <time>) which columns to use,但是当我们通过index=(<id>, <time>)告诉plm要使用哪些列时,

summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)], 
            index=c("firm", "year"),
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

we get the old result.我们得到旧的结果。 If we confuse the columns completely,如果我们完全混淆了列,

summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)],
            model="within"))$coe
# Error 

plm() is indeed confused :) But jus as before, when we help plm() it behaves as expected and yields again the right result. plm()确实很困惑:) 但是和以前一样,当我们帮助plm()它的行为符合预期并再次产生正确的结果。

summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)], 
            index=c("firm", "year"),
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

Notice, that you are actually just calculating company fixed effects.请注意,您实际上只是在计算公司的固定效应。 If you intend to calculate a model with firm and year fixed effects, let's do this as a LSDV model,如果您打算计算具有公司和年份固定效应的模型,让我们将其作为 LSDV 模型进行计算,

summary(lm(inv ~ value + capital + factor(firm) + factor(year) - 1, Grunfeld))$coe[1:2, ]
#          Estimate Std. Error   t value     Pr(>|t|)
# value   0.1177159 0.01375128  8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35

we see that the values are different from above, because the plm s just included the firm fixed effect so far, see:我们看到这些值与上面的不同,因为plm到目前为止只包含了公司固定效应,请参阅:

summary(lm(inv ~ value + capital + factor(firm) - 1, Grunfeld))$coe[1:2, ]
#          Estimate Std. Error   t value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

To get it right, we then also need to specify effect="twoways" to get firm and year fixed effects.为了做到这一点,我们还需要指定effect="twoways"以获得公司和年份固定效应。

summary(plm(inv ~ value + capital, data=Grunfeld,
            index=c("firm", "year"),
            model="within", effect="twoways"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1177159 0.01375128  8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM