简体   繁体   中英

Bootstrapping function with data.table

I have been trying to write a function that takes the results from a simple regression model and calculate the Glass's Delta size effect. That was easy. The problem now is that I would like to calculate confidence intervals for this value and I keep getting an error when I use it with the boot library.

I have tried to follow this answer but with no success.

As an example I am going to use a Stata dataset

library(data.table)
webclass <- readstata13::read.dta13("http://www.stata.com/videos13/data/webclass.dta")
#estimate impact
M0<-lm(formula = math ~ treated ,data = webclass)

######################################
#####        Effect Size       ######
##   Glass's delta=M1-M2/SD2      ##
####################################

ESdelta<-function(regmodel,yvar,tvar,msg=TRUE){
  Data<-regmodel$model
  setDT(Data)
  meanT<-mean(Data[get(tvar)=="Treated",get(yvar)])
  meanC<-mean(Data[get(tvar)=="Control",get(yvar)])
  sdC<-sd(Data[get(tvar)=="Control",get(yvar)])
  ESDelta<-(meanT-meanC)/sdC
  
 if (msg==TRUE) {
   cat(paste("the average scores of the variable-",yvar,"-differ by approximately",round(ESDelta,2),"standard deviations"))
   
 }
    return(ESDelta)
  
}

ESdelta(M0,"math","treated",msg = F)
#0.7635896

Now when I try to use the boot function I got the following error

boot::boot(M0, statistic=ESdelta, R=50,"math","treated")

#Error in match.arg(stype) : 'arg' should be one of “i”, “f”, “w”

Thanks

In the boot manual (type ?boot):

statistic: [...] The first argument passed will always be the original data. The second will be a vector of indices, frequencies or weights which define the bootstrap sample.

You cannot bootstrap a model, so you modify your function to work with the data.table and index, other arguments to the function must be specified after:

ESdelta<-function(Data,inds,yvar,tvar,msg=TRUE){

  Data = Data[inds,]
  meanT<-mean(Data[get(tvar)=="Treated",get(yvar)])
  meanC<-mean(Data[get(tvar)=="Control",get(yvar)])
  sdC<-sd(Data[get(tvar)=="Control",get(yvar)])
  ESDelta<-(meanT-meanC)/sdC

 if (msg==TRUE) {
   cat(paste("the average scores of the variable-",yvar,"-differ by approximately",round(ESDelta,2),"standard deviations"))

 }
    return(ESDelta)

}

Dat <- setDT(M0$model)
bo = boot(Dat, statistic=ESdelta, R=50,yvar="math",tvar="treated",msg=FALSE)


> bo

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Dat, statistic = ESdelta, R = 50, yvar = "math", 
    tvar = "treated", msg = FALSE)


Bootstrap Statistics :
     original     bias    std. error
t1* 0.7635896 0.05685514   0.4058304

You can get the ci by doing:

boot.ci(bo)

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 50 bootstrap replicates

CALL : 
boot.ci(boot.out = bo)

Intervals : 
Level      Normal              Basic         
95%   (-0.0887,  1.5021 )   (-0.8864,  1.5398 )  

Level     Percentile            BCa          
95%   (-0.0126,  2.4136 )   (-0.1924,  1.7579 )  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM