[英]R, add back fitted values plm(), the fitted values are fewer than the observations in the regression
We're doing a panel regression using the plm()
function of R package plm
and want add the fitted values as a new column to the dataset on which the regression was made.我们正在使用 R package
plm
的plm()
function 进行面板回归,并希望将拟合值作为新列添加到回归数据集中。
MP_regression <- plm(operating_exp ~ HHI + rate + rate_lag1 + rate_lag2 +
HHI*rate + HHI*rate_lag1 + HHI*lag2,
data = market_power_merged, effect = "individual",
model = "within", index = c("firm", "date"))
When we use fitted(MP_regression)
as such:当我们像这样使用
fitted(MP_regression)
时:
fitted_values <- fitted(MP_regression)
then it produces fewer fitted values than the observations in the input data for the regression.那么它产生的拟合值少于回归输入数据中的观察值。 So we want to add them back to the
market_power_merged
dataframe by date and firm.所以我们想按日期和公司将它们添加回
market_power_merged
dataframe。 Becase of the fewer fitted values (that the fitted()
function for some reason produces), it is important to match by both date and firm so we can see what observations were excluded in the fitted function, or alternatively remove those for which the fitted function does not produce a value.由于拟合值较少(由于某种原因,
fitted()
function 会产生),因此按日期和公司进行匹配很重要,因此我们可以看到拟合的 function 中排除了哪些观察结果,或者删除了拟合的观察结果function 不产生值。
In essence we want to:本质上,我们想要:
market_power_merged <- mutate(fitted_values = fitted(MP_regression)
and match them by firm (individual) and date (time).并按公司(个人)和日期(时间)进行匹配。
Apparently, the return of fitted()
carries an index attribute which is a data frame of the panel groups for fitted values.显然,
fitted()
的返回带有一个index属性,该属性是用于拟合值的面板组的数据框。 Therefore, consider cbind
on this index attribute to fitted values and then run left_join
or merge
(with all.x=TRUE
) on original data frame:因此,考虑将此索引属性上的
cbind
到拟合值,然后在原始数据帧上运行left_join
或merge
(与all.x=TRUE
):
fitted_values_vec <- fitted(MP_regression)
fitted_values_df <- cbind(attr(fitted_values_vec, "index"),
fitted_values = fitted_values_vec)
Produc <- base::merge(Produc, fit_values, by=c("firm", "date"), all.x=TRUE)
# Produc <- dplyr::left_join(Produc, fit_values, by=c("firm", "date"))
To demonstrate with built-in plm
data frame, Produc :为了演示内置
plm
数据框, Produc :
data("Produc", package = "plm")
# ASSIGN RANDOM NAs ACROSS NON-PANEL COLUMNS
set.seed(41120)
for(col in names(Produc)[!names(Produc) %in% c("state", "year")]) {
Produc[sample(nrow(Produc), 50), col] <- NA
}
results <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
fitted_values_vec <- fitted(results)
str(fitted_values_vec)
# 'pseries' Named num [1:588] -0.2459 -0.2274 -0.0927 -0.0981 -0.0184 ...
# - attr(*, "names")= chr [1:588] "ALABAMA" "ALABAMA" "ALABAMA" "ALABAMA" ...
# - attr(*, "index")=Classes ‘pindex’ and 'data.frame': 588 obs. of 2 variables:
# ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 5 6 7 8 9 10 12 13 ...
fitted_values_df <- cbind(attr(fitted_values_vec, "index"),
fitted_values = fitted_values_vec)
Produc <- merge(Produc, fitted_values_df, by= c("state","year"), all.x=TRUE)
Output Output
head(Produc,10)
# state year region pcap hwy water util pc gsp emp unemp fitted_values
# 1 ALABAMA 1970 6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5 4.7 -0.24591969
# 2 ALABAMA 1971 6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9 5.2 -0.22735513
# 3 ALABAMA 1972 6 15972.41 7765.42 1764.75 6442.23 NA 31303 1072.3 NA NA
# 4 ALABAMA 1973 <NA> NA 7907.66 1742.41 6756.19 40084.01 33430 1135.5 3.9 NA
# 5 ALABAMA 1974 6 16762.67 8025.52 NA 7002.29 42057.31 33749 1169.8 5.5 -0.09272471
# 6 ALABAMA 1975 6 17316.26 8158.23 NA 7405.76 43971.71 33604 1155.4 7.7 -0.09806212
# 7 ALABAMA 1976 6 17732.86 NA 1799.74 7704.93 50221.57 35764 1207.0 6.8 -0.01841929
# 8 ALABAMA 1977 6 18111.93 8365.67 1845.11 7901.15 51084.99 37463 1269.2 7.4 0.02047675
# 9 ALABAMA 1978 6 18479.74 8510.64 1960.51 8008.59 52604.05 39964 1336.5 6.3 0.07225304
# 10 ALABAMA 1979 6 18881.49 8640.61 2081.91 8158.97 54525.86 40979 1362.0 7.1 0.09364171
tail(Produc,10)
# state year region pcap hwy water util pc gsp emp unemp fitted_values
# 807 WYOMING 1977 8 4037.03 2898.34 291.64 847.04 19977.67 9779 170.5 3.6 0.0871588
# 808 WYOMING 1978 8 4115.61 2920.85 294.73 900.04 20760.24 11038 187.4 NA NA
# 809 WYOMING 1979 8 4268.71 2950.53 313.47 1004.71 21643.50 11988 200.7 2.8 0.2346269
# 810 WYOMING 1980 8 NA 2979.23 338.06 1082.40 22628.22 13027 210.2 4.0 NA
# 811 WYOMING 1981 8 4572.67 3005.62 379.19 1187.86 26330.20 13717 223.5 4.1 0.3704301
# 812 WYOMING 1982 8 4731.98 3060.64 408.43 1262.90 27724.96 13056 217.7 5.8 0.3595080
# 813 WYOMING 1983 8 4950.82 3119.98 445.59 NA 28586.46 11922 NA 8.4 NA
# 814 WYOMING 1984 8 5184.73 3195.68 476.57 NA 28794.80 12073 204.3 6.3 0.3199823
# 815 WYOMING 1985 8 5448.38 3295.92 523.01 1629.45 29326.94 12022 NA 7.1 NA
# 816 WYOMING 1986 8 5700.41 3400.96 565.58 1733.88 27110.51 NA 196.3 9.0 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.