简体   繁体   中英

Panel regression - Estimators

I am trying to do a panel regression in R.

pdata <- pdata.frame(NEW, index = c("Year"))

And:

R1 <- plm(Market_Cap ~ GDP_growthR + Volatility_IR + FDI 
          + Savings_rate, data=pdata, model="between")

However when I want to use the within (or random) estimator, I got the following error:

Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, : empty model

But, when I use the between estimator, everything is fine. Do you have any explanation and suggestion?

Thank you!

You should heed the advice in the comments.

I addressed a version of the OP's question on CV . If the structure of the data is the same, then you're only observing one cross-sectional unit over time. In your setting, you're observing a single country over many years . If your data was a true panel dataset, you would be observing more than one country over at least two years. For example, I will simulate a small panel data frame.

library(dplyr)
library(plm)
set.seed(12345)

panel <- tibble(
  country = c(rep("Spain", 5), rep("France", 5), rep("Croatia", 5)),
  year = rep(2016:2020, 3),                   # each country is observed over 5 years           
  x = rnorm(15),                              # sample 15 random deviates (5 per country)
  y = sample(c(10000:100000), size = 15)      # sample incomes (range: 10,000 - 100,000)
  ) %>%
  mutate(
    France = ifelse(country == "France", 1, 0),
    Croatia = ifelse(country == "Croatia", 1, 0),
    y_2016 = ifelse(year == 2016, 1, 0),
    y_2017 = ifelse(year == 2017, 1, 0),
    y_2018 = ifelse(year == 2018, 1, 0),
    y_2019 = ifelse(year == 2019, 1, 0),
    y_2020 = ifelse(year == 2020, 1, 0)
    )

Inside of the mutate() function I appended dummies for all countries and all years, excluding one country and one year. In your other question, you estimate time fixed effects. Software invariably drops one year to avoid collinearity. You don't need to append the dummies, but they are helpful for explication purposes. Here is a classic panel data frame:

# Panel - varies across two dimensions (country + time)
# 3 countries observed over 5 years for a total of 15 country-year observations

# A tibble: 15 x 10
   country  year      x     y France Croatia y_2017 y_2018 y_2019 y_2020
   <chr>   <int>  <dbl> <int>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 Spain    2016  0.586 81371      0       0      0      0      0      0
 2 Spain    2017  0.709 10538      0       0      1      0      0      0
 3 Spain    2018 -0.109 26893      0       0      0      1      0      0
 4 Spain    2019 -0.453 71363      0       0      0      0      1      0
 5 Spain    2020  0.606 43308      0       0      0      0      0      1
 6 France   2016 -1.82  42544      1       0      0      0      0      0
 7 France   2017  0.630 88187      1       0      1      0      0      0
 8 France   2018 -0.276 91368      1       0      0      1      0      0
 9 France   2019 -0.284 65563      1       0      0      0      1      0
10 France   2020 -0.919 22061      1       0      0      0      0      1
11 Croatia  2016 -0.116 80390      0       1      0      0      0      0
12 Croatia  2017  1.82  48623      0       1      1      0      0      0
13 Croatia  2018  0.371 93444      0       1      0      1      0      0
14 Croatia  2019  0.520 79582      0       1      0      0      1      0
15 Croatia  2020 -0.751 33367      0       1      0      0      0      1

As @DaveArmstrong correctly noted, you should specify the panel indexes. First, we specify a panel data frame, then we estimate the model.

pdata <- pdata.frame(panel, index = c("year", "country"))
random <- plm(y ~ x, model = "random", data = pdata)

A one-way random effects model is fit. The call to summary() will produce the following (abridged output):

Call:
plm(formula = y ~ x, data = pdata, model = "random")

Balanced Panel: n = 5, T = 3, N = 15

Effects:
                    var   std.dev share
idiosyncratic 685439601     26181 0.819
individual    151803385     12321 0.181
theta: 0.2249

Residuals:
   Min. 1st Qu.  Median 3rd Qu.    Max. 
 -49380  -17266    6221   17759   32442 

Coefficients:
            Estimate Std. Error z-value  Pr(>|z|)    
(Intercept)  58308.0     8653.7  6.7380 1.606e-11 ***
x             7777.0     8808.9  0.8829    0.3773    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

But your data does not have this structure, hence the warning message. In fact, your data is similar to carving out one country from this panel. For example, suppose we winnowed down the data frame to Croatian observations only. The following code takes a subset of the previous data frame:

croatia_only <- panel %>%
  filter(country == "Croatia")  # grab only the observations from Croatia

Here, longitudinal variation only exists for one country . In other words, by restricting attention to Croatia, we cannot exploit the variation across countries ; we only have variation in one dimension: The resulting data frame looks like the following:

# Time Series - varies across one dimension (time)

# A tibble: 5 x 10
  country  year      x     y France Croatia y_2017 y_2018 y_2019 y_2020
  <chr>   <int>  <dbl> <int>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Croatia  2016 -0.116 80390      0       1      0      0      0      0
2 Croatia  2017  1.82  48623      0       1      1      0      0      0
3 Croatia  2018  0.371 93444      0       1      0      1      0      0
4 Croatia  2019  0.520 79582      0       1      0      0      1      0
5 Croatia  2020 -0.751 33367      0       1      0      0      0      1

Now I will re-estimate a random effects model with one country:

pdata <- pdata.frame(croatia_only, index = c("year", "country"))
random_croatia <- plm(y ~ x , model = "random", data = pdata)

This should reproduce your error message (ie, empty model). Note, you only have variation within one country, As you correctly noted, a "between-effects" model is estimable. but not for reasons you might presume, A "between effects" model averages over all years within a country. then it runs ordinary least squares on the 'averaged' data, In your setting. taking the average over your time series results in a country mean, And since you only observe one country. then you only have one observation. Such a model is inestimable, However. you can 'pool' together all of your yearly observations for one country and run a linear model instead. That is what you're doing, To test this out using one country. try comparing the "between" model with the "pooling" model. They should produce identical estimates of x .

# Run this using the croatia_only data frame

summary(plm(y ~ x , model = "between", data = pdata)) 
summary(plm(y ~ x , model = "pooling", data = pdata))

It should be painfully obvious now, but model = "pooling" is equivalent to running lm() .

If you want me to tie this into your previous post, try estimating a linear model with separate dummies for all years as covariates. You will quickly discover that you have no residual degrees of freedom, which is exactly the problem outlined in your other post.

In sum, I would look for data from other countries. Once you do that, you can use plm() for all it's worth.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM