简体   繁体   English

R 中有 PLM 的预测函数吗?

[英]Is there a predict function for PLM in R?

I have a small N large T panel which I am estimating via plm (panel linear regression model), with fixed effects.我有一个小的 N 大 T 面板,我通过 plm(面板线性回归模型)进行估计,具有固定效果。

Is there any way to get predicted values for a new dataset?有没有办法获得新数据集的预测值? (I want to estimate parameters on a subset of my sample, and then use these to calculate model-implied values for the whole sample). (我想估计样本子集的参数,然后使用这些来计算整个样本的模型隐含值)。

Thanks!谢谢!

I wrote a function called predict.out.plm that can create predictions for the original data and for a manipulated data set (with equal column names).我编写了一个名为predict.out.plm的函数,它可以为原始数据和操作数据集(具有相同的列名)创建预测。

The predict.out.plm calculates a) the predicted (fitted) outcome of the transformed data and b) constructs the according to level outcome. predict.out.plm计算 a) 转换数据的预测(拟合)结果和 b) 根据级别构建结果。 The function works for First Difference (FD) estimations and Fixed Effects (FE) estimations using plm .该函数适用于使用plm一阶差分 (FD) 估计和固定效应 (FE) 估计。 For FD it creates the differenced outcome over time and for FE it creates the time-demeaned outcome.对于 FD,它会随着时间的推移产生不同的结果,而对于 FE,它会产生随时间推移的结果。

The function is largely untested, and probably only works with strongly balanced data frames.该功能在很大程度上未经测试,可能仅适用于高度平衡的数据帧。

Any suggestions and corrections are very welcome.非常欢迎任何建议和更正。 Help to develop a small R package would be very appreciated.帮助开发一个小的 R 包将不胜感激。

The function predict.out.plm函数predict.out.plm

predict.out.plm<-function(
  estimate,
  formula,
  data,
  model="fd",
  pname="y",
  pindex=NULL,
  levelconstr=T
){
  # estimate=e.fe
  # formula=f
  # data=d
  # model="within"
  # pname="y"
  # pindex=NULL
  # levelconstr=T
  #get index of panel data
  if (is.null(pindex) && class(data)[1]=="pdata.frame") {
    pindex<-names(attributes(data)$index)
  } else {
    pindex<-names(data)[1:2]
  }
  if (class(data)[1]!="pdata.frame") { 
    data<-pdata.frame(data)
  }
  #model frame
  mf<-model.frame(formula,data=data)
  #model matrix - transformed data
  mn<-model.matrix(formula,mf,model)

  #define variable names
  y.t.hat<-paste0(pname,".t.hat")
  y.l.hat<-paste0(pname,".l.hat")
  y.l<-names(mf)[1]

  #transformed data of explanatory variables 
  #exclude variables that were droped in estimation
  n<-names(estimate$aliased[estimate$aliased==F])
  i<-match(n,colnames(mn))
  X<-mn[,i]

  #predict transformed outcome with X * beta
  # p<- X %*% coef(estimate)
  p<-crossprod(t(X),coef(estimate))
  colnames(p)<-y.t.hat

  if (levelconstr==T){
    #old dataset with original outcome
    od<-data.frame(
      attributes(mf)$index,
      data.frame(mf)[,1]
    )
    rownames(od)<-rownames(mf) #preserve row names from model.frame
    names(od)[3]<-y.l

    #merge old dataset with prediciton
    nd<-merge(
      od,
      p,
      by="row.names",
      all.x=T,
      sort=F
    )
    nd$Row.names<-as.integer(nd$Row.names)
    nd<-nd[order(nd$Row.names),]

    #construct predicted level outcome for FD estiamtions
    if (model=="fd"){
      #first observation from real data
      i<-which(is.na(nd[,y.t.hat]))
      nd[i,y.l.hat]<-NA
      nd[i,y.l.hat]<-nd[i,y.l]
      #fill values over all years
      ylist<-unique(nd[,pindex[2]])[-1]
      ylist<-as.integer(as.character(ylist))
      for (y in ylist){
        nd[nd[,pindex[2]]==y,y.l.hat]<-
          nd[nd[,pindex[2]]==(y-1),y.l.hat] + 
          nd[nd[,pindex[2]]==y,y.t.hat]
      }
    } 
    if (model=="within"){
      #group means of outcome
      gm<-aggregate(nd[, pname], list(nd[,pindex[1]]), mean)
      gl<-aggregate(nd[, pname], list(nd[,pindex[1]]), length)
      nd<-cbind(nd,groupmeans=rep(gm$x,gl$x))
      #predicted values + group means
      nd[,y.l.hat]<-nd[,y.t.hat] + nd[,"groupmeans"]
    } 
    if (model!="fd" && model!="within") {
      stop('funciton works only for FD and FE estimations')
    }
  }
  #results
  results<-p
  if (levelconstr==T){
    results<-list(results,nd)
    names(results)<-c("p","df")
  }
  return(results)
}

Testing the the function:测试功能:

##packages
library(plm)

##test dataframe
#data structure
N<-4
G<-2
M<-5
d<-data.frame(
  id=rep(1:N,each=M),
  year=rep(1:M,N)+2000,
  gid=rep(1:G,each=M*2)
)
#explanatory variable
d[,"x"]=runif(N*M,0,1)
#outcome
d[,"y"] = 2 * d[,"x"] + runif(N*M,0,1)
#panel data frame
d<-pdata.frame(d,index=c("id","year"))

##new data frame for out of sample prediction
dn<-d
dn$x<-rnorm(nrow(dn),0,2)

##estimate
#formula
f<- pFormula(y ~ x + factor(year))
#fixed effects or first difffernce estimation
e<-plm(f,data=d,model="within",index=c("id","year"))
e<-plm(f,data=d,model="fd",index=c("id","year"))
summary(e)

##fitted values of estimation
#transformed outcome prediction 
predict(e)
c(pmodel.response(e)-residuals(e))
predict.out.plm(e,f,d,"fd")$p
# "level" outcome prediciton 
predict.out.plm(e,f,d,"fd")$df$y.l.hat
#both
predict.out.plm(e,f,d,"fd")

##out of sampel prediciton 
predict(e,newdata=d) 
predict(e,newdata=dn) 
# Error in crossprod(beta, t(X)) : non-conformable arguments
# if plm omits variables specified in the formula (e.g. one year in factor(year))
# it tries to multiply two matrices with different length of columns than regressors
# the new funciton avoids this and therefore is able to do out of sample predicitons
predict.out.plm(e,f,dn,"fd")

There are (at least) two methods in the package to produce estimates from plm objects:包中有(至少)两种方法可以从 plm 对象生成估计值:

-- fixef.plm: Extract the Fixed Effects -- fixef.plm:提取固定效应

-- pmodel.response: A function to extract the model.response -- pmodel.response:提取model.response的函数

It appears to me that the author(s) are not interested in providing estimates for the "random effects".在我看来,作者对提供“随机效应”的估计不感兴趣。 It may be a matter of "if you don't know how to do it on your own, then we don't want to give you a sharp knife to cut yourself too deeply."可能是“如果你自己不知道怎么做,那我们也不想给你一把锋利的刀,把自己割得太深。”

plm has now a predict.plm() function, although it is not documented/exported. plm现在有一个predict.plm()函数,虽然它没有记录/导出。

Note also that predict works on the transformed model (ie after doing the within/between/fd transformation), not the original one.另请注意, predict适用于转换后的模型(即在执行内部/之间/fd 转换之后),而不是原始模型。 I speculate that the reason for this is that it is more difficult to do prediction in a panel data framework.我推测这样做的原因是在面板数据框架中更难做预测。 Indeed, you need to consider whether you are predicting:事实上,你需要考虑你是否在预测:

  • new time periods, for existing individual and you used a individual-FE?新的时间段,对于现有的个人并且您使用了个人-FE? Then you can add the prediction to the existing individual mean然后您可以将预测添加到现有的个人均值中
  • new time periods, for new individual?新的时间段,对于新的个人? Then you need to figure out which individual mean you are going to use?然后你需要弄清楚你要使用哪个个人的意思?
  • the same is even more complicated is you use a random-effect model, as the effects are not easily derived同样更复杂的是您使用随机效应模型,因为效果不容易导出

In the code below, I illustrate how to use fitted values, on the existing sample:在下面的代码中,我说明了如何在现有样本上使用拟合值:

library(plm)
#> Loading required package: Formula
library(tidyverse)

data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
          data = Produc, index = c("state","year"))


## produce a dataset of prediction, added to the group means
Produc_means <- Produc %>% 
  mutate(y = log(gsp)) %>% 
  group_by(state) %>% 
  transmute(y_mean = mean(y),
            y = y, 
            year = year) %>% 
  ungroup() %>% 
  mutate(y_pred = predict(zz) + y_mean) %>% 
  select(-y_mean)

## plot it
Produc_means %>% 
  gather(type, value, y, y_pred) %>% 
  filter(state %in% toupper(state.name[1:5])) %>% 
  ggplot(aes(x = year, y = value, linetype = type))+
  geom_line() +
  facet_wrap(~state) +
  ggtitle("Visualising in-sample prediction, for 4 states")
#> Warning: attributes are not identical across measure variables;
#> they will be dropped

Created on 2018-11-20 by the reprex package (v0.2.1)reprex 包(v0.2.1) 于 2018 年 11 月 20 日创建

Looks like there is a new package to do in-sample predictions for a variety of models including plm看起来有一个新包可以对包括 plm 在内的各种模型进行样本内预测

https://cran.r-project.org/web/packages/prediction/prediction.pdf https://cran.r-project.org/web/packages/prediction/prediction.pdf

You can calculate the residuals via residuals(reg_name) .您可以通过residuals(reg_name)计算残差。 From here, you can subtract them from your response variable and get the predicted values.从这里,您可以从响应变量中减去它们并获得预测值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM