简体   繁体   中英

plm: Fixed Effects Regression - Index / ID order

I am running fixed effects regressions using the plm package. Why and how does the order of the ID code have an impact on the regression?

I used these codes for running the regressions, which only differ between the order of the ID code Company and Year .

The code:

MV_Year <- plm (MVlog ~ LEV + Size + DY + RDlog
                , data=Values, model="within", index= c("Year","Company"))


MV_Company <- plm (MVlog ~ LEV + Size + DY + RDlog,
                   data=Values, model="within", index= c("Company", "Year"))

The corresponding outputs: MV_Year:

Oneway (individual) effect Within Model

Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values, 
    model = "within", index = c("Year", "Company"))

Unbalanced Panel: n = 17, T = 557-4280, N = 29890

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-5.250901 -0.457100  0.015763  0.476140  6.006483 

Coefficients:
         Estimate  Std. Error t-value Pr(>|t|)    
LEV   -1.95485031  0.04060539 -48.143  < 2e-16 ***
Size   0.75233709  0.00314849 238.952  < 2e-16 ***
DY    -0.00033192  0.00013482  -2.462  0.01382 *  
RDlog  0.13148626  0.00300509  43.755  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    102610
Residual Sum of Squares: 17568
R-Squared:      0.82879
Adj. R-Squared: 0.82868
F-statistic: 36148 on 4 and 29869 DF, p-value: < 2.22e-16

MV_Company

Oneway (individual) effect Within Model

Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values, 
    model = "within", index = c("Company", "Year"))

Unbalanced Panel: n = 5911, T = 1-17, N = 29890

Residuals:
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-4.35967 -0.38711  0.00000  0.40528  5.48624 

Coefficients:
         Estimate  Std. Error  t-value Pr(>|t|)    
LEV   -1.88958140  0.04392991 -43.0135  < 2e-16 ***
Size   0.74650676  0.00375926 198.5782  < 2e-16 ***
DY    -0.00034308  0.00014585  -2.3524  0.01866 *  
RDlog  0.13904360  0.00331886  41.8950  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    58168
Residual Sum of Squares: 12747
R-Squared:      0.78085
Adj. R-Squared: 0.72679
F-statistic: 21356.2 on 4 and 23975 DF, p-value: < 2.22e-16

Why do the outputs have these small differences between the different estimates and R^2?

The reason for the index= option is that plm() internally uses pdata.frame() which expects the first column to be the "id" and the second column to be the "time" if the according names are not specified by index=(<id>, <time>)

From ?pdata.frame we can read:

The index argument indicates the dimensions of the panel. It can be:

  • a vector of two character strings which contains the names of the individual and of the time indexes,
  • a character string which is the name of the individual index variable. In this case, the time index is created automatically and
    a new variable called "time" is added, assuming consecutive and
    ascending time periods in the order of the original data, ...

The following example will help us to understand this. First we load the Grunfeld data, which looks like this.

library(plm)
data(Grunfeld)
head(Grunfeld, 3)
#   firm year   inv  value capital
# 1    1 1935 317.6 3078.5     2.8
# 2    1 1936 391.8 4661.7    52.6
# 3    1 1937 410.6 5387.1   156.9

First column is the ID, second column is the time. Let's estimate a model.

summary(plm(inv ~ value + capital, data=Grunfeld,
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

Now, when we confuse the first and the second column,

summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)],
            model="within"))$coe
#          Estimate  Std. Error   t-value     Pr(>|t|)
# value   0.1167978 0.006331302 18.447672 3.586220e-43
# capital 0.2197066 0.032296107  6.802881 1.503653e-10

the result is different. But when we tell plm by index=(<id>, <time>) which columns to use,

summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)], 
            index=c("firm", "year"),
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

we get the old result. If we confuse the columns completely,

summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)],
            model="within"))$coe
# Error 

plm() is indeed confused :) But jus as before, when we help plm() it behaves as expected and yields again the right result.

summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)], 
            index=c("firm", "year"),
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

Notice, that you are actually just calculating company fixed effects. If you intend to calculate a model with firm and year fixed effects, let's do this as a LSDV model,

summary(lm(inv ~ value + capital + factor(firm) + factor(year) - 1, Grunfeld))$coe[1:2, ]
#          Estimate Std. Error   t value     Pr(>|t|)
# value   0.1177159 0.01375128  8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35

we see that the values are different from above, because the plm s just included the firm fixed effect so far, see:

summary(lm(inv ~ value + capital + factor(firm) - 1, Grunfeld))$coe[1:2, ]
#          Estimate Std. Error   t value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

To get it right, we then also need to specify effect="twoways" to get firm and year fixed effects.

summary(plm(inv ~ value + capital, data=Grunfeld,
            index=c("firm", "year"),
            model="within", effect="twoways"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1177159 0.01375128  8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM