regression by group

Question

Just a very quick question, I want to run the regression using MASS. The dependent variable are val1, val2, val3 respectively and independent variables are a, b, c, d.

Just look at the fake data.

library(data.table)
library(MASS)
test <- data.table(val1 = 1:10, val2 = 11:20, val3 = 21:30, a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10))
summary1 <- glm.nb(val1 ~ a + b + c + d, data = test)
summary2 <- glm.nb(val2 ~ a + b + c + d, data = test)
summary3 <- glm.nb(val3 ~ a + b + c + d, data = test)

I think the code is ugly. I tried this

for (i in c("val1", "val2", "val3")){
paste("sum_", c("val1", "val2", "val3"), sep = "") <- glm.nb(i ~ a + b + c + d, data = simple)
}

But it didn't work. Any suggestions about the improvements? In the original data, there're about 26 independent variables, and I think it will be more ugly if the code is like this sum1 <- glm.nb(val3 ~ a + b + c + d + e + f+ g + h + i + j + k + l, data = test)

I know the following code might be helpful, but I don't know how to use them...:(

diff <- setdiff(colnames(test),c('val1','val2','val3'))

Also, I wonder whether lapply function can achieve this within data.table?

Thanks a lot!

Answer 1

Better to put your data in the long format :

library(plyr)
library(reshape2)
xx <- melt(test,measure.vars=paste0('val',1:3))
ddply(xx,.(variable),function(x){
  coef(glm.nb(value~.,data=subset(x,select=-variable)))
})

 variable (Intercept)            a            b           c          d
1     val1    1.583602 -0.045909060 -0.018189342 0.026293033 0.29708648
2     val2    2.704601 -0.014641683 -0.003836401 0.006711503 0.10445377
3     val3    3.217729 -0.008925782 -0.001863267 0.003475509 0.06292286

If you want all the model not just the coefficients:

dlply(xx,.(variable),function(x){
  glm.nb(value~.,data=subset(x,select=-variable))
})

Answer 2

Using your loop approach I would simply store all my models in a list like so

results <- list()

for (i in c("val1", "val2", "val3")){
  frml <- paste(i, "~ a + b + c + d")
  frml <- as.formula(frml)

  results[[i]] <- glm.nb(frml, data = simple)
}

And then access the models in the list by looking at results$val1 etc.

Answer 3

And here is a solution with lapply :

summary.list<-lapply(test[, .SD, .SDcols=patterns('val')],
                     function(i) glm.nb(i ~ a + b + c + d, data = test))

regression by group

Question

3 answers

solution1
5 ACCPTED 2014-01-16 16:50:33

solution2
2 2014-01-16 16:41:50

solution3
1 2014-01-16 16:47:15

regression by group

Question

3 answers

solution1 5 ACCPTED 2014-01-16 16:50:33

solution2 2 2014-01-16 16:41:50

solution3 1 2014-01-16 16:47:15

solution1
5 ACCPTED 2014-01-16 16:50:33

solution2
2 2014-01-16 16:41:50

solution3
1 2014-01-16 16:47:15