简体   繁体   中英

Using stargazer with memory greedy glm objects

I'm trying to run the following regression:

m1=glm(y~x1+x2+x3+x4,data=df,family=binomial())
m2=glm(y~x1+x2+x3+x4+x5,data=df,family=binomial())
m3=glm(y~x1+x2+x3+x4+x5+x6,data=df,family=binomial())
m4=glm(y~x1+x2+x3+x4+x5+x6+x7,data=df,family=binomial())

and then to print them using the stargazer package:

stargazer(m1,m2,m3,m4 type="html", out="models.html")

Thing is, the data frame df is rather big (~600MB) and thus each glm object I create is at least ~1.5GB. This creates a memory issue which prevents me from creating all the regressions I need to print in stargazer .

I've tried 2 approches in order to decrease the size of the glm objects:

  1. Trim the glm object using this tutorial. This indeed trims the glm object to <1MB, though I get the following error from the stargazer function:
Error in Qr$qr[p1, p1, drop = FALSE] : incorrect number of dimensions
  1. Use the package speedglm . however, it's not supported by stargazer .

Any suggestions?

The stargazer calls summary which requires qr (see source code). So -- as far as I know -- it is not possible.

BUT I think that it should be easy to rewrite stargazer to handle a list of summaries as an input. It would be extremely handy.

An option that has worked well for me is to first convert the large *lm objects to "coeftest" class using the lmtest package. A "coeftest" object is really just a matrix of your summarised regression results and hardly takes up any space as a result. Moreover, Stargazer readily accepts the "coeftest" class as an input, so your code doesn't need to change much at all.

Using your example:

library(lmtest)

m1 <- glm(y~x1+x2+x3+x4,data=df,family=binomial())
m1 <- coeftest(m1)
m2 <- glm(y~x1+x2+x3+x4+x5,data=df,family=binomial())
m2 <- coeftest(m2)
m3 <- glm(y~x1+x2+x3+x4+x5+x6,data=df,family=binomial())
m3 <- coeftest(m3)
m4 <- glm(y~x1+x2+x3+x4+x5+x6+x7,data=df,family=binomial())
m4 <- coeftest(m4)

stargazer(m1,m2,m3,m4 type="html", out="models.html")

Apart from taking care of the memory problem, this approach has the added benefit of the coeftest() transformation itself being extremely quick. (Well, with the notable exception of times when you ask it to produce robust/clustered standard errors on a particularly large *lm object by invoking the "vcov = vcovHC" option. However, even then, the coeftest() transformation is a necessary step to exporting the robust regression results in the first place.)

A minor downside to this approach is that it doesn't save some regression statistics that may be of interest for your Stargazer table (eg R-squared or N). However, you could easily obtain these from the *lm object before converting it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM