简体   繁体   中英

Regression models as column in data table, R

I am struggling to find a way how to use the power of data tables while running some regression models.

Here is a simplified working case:

# given a data table containing desired variables
MyVarb <- data.table(Y=rnorm(100),
                 V1=rnorm(100),
                 V2=rnorm(100))

# given a new data table containing a series of formulas/equations in a column
DT <- data.table(eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2"))

# I store the linear regression models in a second column
DT[, "models" := lapply(eq, function(i) lm(i, data=MyVarb))]

# Now, I can access the coefficients of a model (e.g. the 3rd one) like:
DT[3, models][[1]]$coefficients
(Intercept)          V1          V2 
-0.01583034  0.08284029  0.01630247 

However, I am curious if there are alternative ways. This doesn't work as desired:

DT[, "trial" := lm(eq, data=MyVarb)]
# ***sorry for my bad understanding of data tables and objects***

I am curious and I want to run thousands of models and there are many more variables, therefore it is time consuming using the lapply inside the data table DT (couple of hours on my PC and then I run out of the 8Gb of RAM...). Is there a way how to code it faster?

I would appreciate your kind help.

If you just need the coefficients, p-values and AIC then this will work while not using up a bunch of memory storing unnecessary bits of lm objects

MyVarb <- data.table(Y=rnorm(100),
                     V1=rnorm(100),
                     V2=rnorm(100))
eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2")
DT<-rbindlist(lapply(eq, function(mod) {
  reg<-lm(mod, data=MyVarb)
  dt<-data.table(summary(reg)$coefficients)
  dt[,coef:=row.names(summary(reg)$coefficients)]
  dt[,aic:=AIC(reg)]
  dt[,model:=mod]


})) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM