简体   繁体   中英

regression by group

Just a very quick question, I want to run the regression using MASS. The dependent variable are val1, val2, val3 respectively and independent variables are a, b, c, d.

Just look at the fake data.

library(data.table)
library(MASS)
test <- data.table(val1 = 1:10, val2 = 11:20, val3 = 21:30, a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10))
summary1 <- glm.nb(val1 ~ a + b + c + d, data = test)
summary2 <- glm.nb(val2 ~ a + b + c + d, data = test)
summary3 <- glm.nb(val3 ~ a + b + c + d, data = test)

I think the code is ugly. I tried this

for (i in c("val1", "val2", "val3")){
paste("sum_", c("val1", "val2", "val3"), sep = "") <- glm.nb(i ~ a + b + c + d, data = simple)
}

But it didn't work. Any suggestions about the improvements? In the original data, there're about 26 independent variables, and I think it will be more ugly if the code is like this sum1 <- glm.nb(val3 ~ a + b + c + d + e + f+ g + h + i + j + k + l, data = test)

I know the following code might be helpful, but I don't know how to use them...:(

diff <- setdiff(colnames(test),c('val1','val2','val3'))

Also, I wonder whether lapply function can achieve this within data.table?

Thanks a lot!

Better to put your data in the long format :

library(plyr)
library(reshape2)
xx <- melt(test,measure.vars=paste0('val',1:3))
ddply(xx,.(variable),function(x){
  coef(glm.nb(value~.,data=subset(x,select=-variable)))
})

 variable (Intercept)            a            b           c          d
1     val1    1.583602 -0.045909060 -0.018189342 0.026293033 0.29708648
2     val2    2.704601 -0.014641683 -0.003836401 0.006711503 0.10445377
3     val3    3.217729 -0.008925782 -0.001863267 0.003475509 0.06292286

If you want all the model not just the coefficients:

dlply(xx,.(variable),function(x){
  glm.nb(value~.,data=subset(x,select=-variable))
})

Using your loop approach I would simply store all my models in a list like so

results <- list()

for (i in c("val1", "val2", "val3")){
  frml <- paste(i, "~ a + b + c + d")
  frml <- as.formula(frml)

  results[[i]] <- glm.nb(frml, data = simple)
}

And then access the models in the list by looking at results$val1 etc.

And here is a solution with lapply :

summary.list<-lapply(test[, .SD, .SDcols=patterns('val')],
                     function(i) glm.nb(i ~ a + b + c + d, data = test))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM