简体   繁体   English

R,加回拟合值plm(),拟合值小于回归中的观测值

[英]R, add back fitted values plm(), the fitted values are fewer than the observations in the regression

We're doing a panel regression using the plm() function of R package plm and want add the fitted values as a new column to the dataset on which the regression was made.我们正在使用 R package plmplm() function 进行面板回归,并希望将拟合值作为新列添加到回归数据集中。

MP_regression <- plm(operating_exp ~ HHI + rate + rate_lag1 + rate_lag2 +
                   HHI*rate + HHI*rate_lag1 + HHI*lag2,
                 data = market_power_merged, effect = "individual",
                 model = "within", index = c("firm", "date"))

When we use fitted(MP_regression) as such:当我们像这样使用fitted(MP_regression)时:

fitted_values <- fitted(MP_regression)

then it produces fewer fitted values than the observations in the input data for the regression.那么它产生的拟合值少于回归输入数据中的观察值。 So we want to add them back to the market_power_merged dataframe by date and firm.所以我们想按日期和公司将它们添加回market_power_merged dataframe。 Becase of the fewer fitted values (that the fitted() function for some reason produces), it is important to match by both date and firm so we can see what observations were excluded in the fitted function, or alternatively remove those for which the fitted function does not produce a value.由于拟合值较少(由于某种原因, fitted() function 会产生),因此按日期和公司进行匹配很重要,因此我们可以看到拟合的 function 中排除了哪些观察结果,或者删除了拟合的观察结果function 不产生值。

In essence we want to:本质上,我们想要:

market_power_merged <- mutate(fitted_values = fitted(MP_regression)

and match them by firm (individual) and date (time).并按公司(个人)和日期(时间)进行匹配。

Apparently, the return of fitted() carries an index attribute which is a data frame of the panel groups for fitted values.显然, fitted()的返回带有一个index属性,该属性是用于拟合值的面板组的数据框。 Therefore, consider cbind on this index attribute to fitted values and then run left_join or merge (with all.x=TRUE ) on original data frame:因此,考虑将此索引属性上的cbind到拟合值,然后在原始数据帧上运行left_joinmerge (与all.x=TRUE ):

fitted_values_vec <- fitted(MP_regression)
fitted_values_df <- cbind(attr(fitted_values_vec, "index"), 
                          fitted_values = fitted_values_vec)

Produc <- base::merge(Produc, fit_values, by=c("firm", "date"), all.x=TRUE)    
# Produc <- dplyr::left_join(Produc, fit_values, by=c("firm", "date"))

To demonstrate with built-in plm data frame, Produc :为了演示内置plm数据框, Produc

data("Produc", package = "plm")

# ASSIGN RANDOM NAs ACROSS NON-PANEL COLUMNS
set.seed(41120)
for(col in names(Produc)[!names(Produc) %in% c("state", "year")]) {
  Produc[sample(nrow(Produc), 50), col] <- NA
}

results <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
               data = Produc, index = c("state","year"))

fitted_values_vec <- fitted(results)
str(fitted_values_vec)
# 'pseries' Named num [1:588] -0.2459 -0.2274 -0.0927 -0.0981 -0.0184 ...
# - attr(*, "names")= chr [1:588] "ALABAMA" "ALABAMA" "ALABAMA" "ALABAMA" ...
# - attr(*, "index")=Classes ‘pindex’ and 'data.frame': 588 obs. of  2 variables:
#   ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 5 6 7 8 9 10 12 13 ...


fitted_values_df <- cbind(attr(fitted_values_vec, "index"), 
                          fitted_values = fitted_values_vec)

Produc <- merge(Produc, fitted_values_df, by= c("state","year"), all.x=TRUE)

Output Output

head(Produc,10)

#      state year region     pcap     hwy   water    util       pc   gsp    emp unemp fitted_values
# 1  ALABAMA 1970      6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5   4.7   -0.24591969
# 2  ALABAMA 1971      6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9   5.2   -0.22735513
# 3  ALABAMA 1972      6 15972.41 7765.42 1764.75 6442.23       NA 31303 1072.3    NA            NA
# 4  ALABAMA 1973   <NA>       NA 7907.66 1742.41 6756.19 40084.01 33430 1135.5   3.9            NA
# 5  ALABAMA 1974      6 16762.67 8025.52      NA 7002.29 42057.31 33749 1169.8   5.5   -0.09272471
# 6  ALABAMA 1975      6 17316.26 8158.23      NA 7405.76 43971.71 33604 1155.4   7.7   -0.09806212
# 7  ALABAMA 1976      6 17732.86      NA 1799.74 7704.93 50221.57 35764 1207.0   6.8   -0.01841929
# 8  ALABAMA 1977      6 18111.93 8365.67 1845.11 7901.15 51084.99 37463 1269.2   7.4    0.02047675
# 9  ALABAMA 1978      6 18479.74 8510.64 1960.51 8008.59 52604.05 39964 1336.5   6.3    0.07225304
# 10 ALABAMA 1979      6 18881.49 8640.61 2081.91 8158.97 54525.86 40979 1362.0   7.1    0.09364171

tail(Produc,10)

#       state year region    pcap     hwy  water    util       pc   gsp   emp unemp fitted_values
# 807 WYOMING 1977      8 4037.03 2898.34 291.64  847.04 19977.67  9779 170.5   3.6     0.0871588
# 808 WYOMING 1978      8 4115.61 2920.85 294.73  900.04 20760.24 11038 187.4    NA            NA
# 809 WYOMING 1979      8 4268.71 2950.53 313.47 1004.71 21643.50 11988 200.7   2.8     0.2346269
# 810 WYOMING 1980      8      NA 2979.23 338.06 1082.40 22628.22 13027 210.2   4.0            NA
# 811 WYOMING 1981      8 4572.67 3005.62 379.19 1187.86 26330.20 13717 223.5   4.1     0.3704301
# 812 WYOMING 1982      8 4731.98 3060.64 408.43 1262.90 27724.96 13056 217.7   5.8     0.3595080
# 813 WYOMING 1983      8 4950.82 3119.98 445.59      NA 28586.46 11922    NA   8.4            NA
# 814 WYOMING 1984      8 5184.73 3195.68 476.57      NA 28794.80 12073 204.3   6.3     0.3199823
# 815 WYOMING 1985      8 5448.38 3295.92 523.01 1629.45 29326.94 12022    NA   7.1            NA
# 816 WYOMING 1986      8 5700.41 3400.96 565.58 1733.88 27110.51    NA 196.3   9.0            NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM