简体   繁体   English

按组回归

[英]regression by group

Just a very quick question, I want to run the regression using MASS.只是一个非常简单的问题,我想使用 MASS 运行回归。 The dependent variable are val1, val2, val3 respectively and independent variables are a, b, c, d.因变量分别为 val1、val2、val3,自变量为 a、b、c、d。

Just look at the fake data.看看假数据就知道了。

library(data.table)
library(MASS)
test <- data.table(val1 = 1:10, val2 = 11:20, val3 = 21:30, a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10))
summary1 <- glm.nb(val1 ~ a + b + c + d, data = test)
summary2 <- glm.nb(val2 ~ a + b + c + d, data = test)
summary3 <- glm.nb(val3 ~ a + b + c + d, data = test)

I think the code is ugly.我认为代码很丑陋。 I tried this我试过这个

for (i in c("val1", "val2", "val3")){
paste("sum_", c("val1", "val2", "val3"), sep = "") <- glm.nb(i ~ a + b + c + d, data = simple)
}

But it didn't work.但它没有用。 Any suggestions about the improvements?关于改进的任何建议? In the original data, there're about 26 independent variables, and I think it will be more ugly if the code is like this sum1 <- glm.nb(val3 ~ a + b + c + d + e + f+ g + h + i + j + k + l, data = test)原始数据中,大约有26个自变量,我觉得如果代码像这样sum1 <- glm.nb(val3 ~ a + b + c + d + e + f+ g + h + i + j + k + l, data = test)

I know the following code might be helpful, but I don't know how to use them...:(我知道以下代码可能会有所帮助,但我不知道如何使用它们...:(

diff <- setdiff(colnames(test),c('val1','val2','val3'))

Also, I wonder whether lapply function can achieve this within data.table?另外,我想知道lapply函数是否可以在data.table中实现这一点?

Thanks a lot!非常感谢!

Better to put your data in the long format :最好将您的数据放在长格式中:

library(plyr)
library(reshape2)
xx <- melt(test,measure.vars=paste0('val',1:3))
ddply(xx,.(variable),function(x){
  coef(glm.nb(value~.,data=subset(x,select=-variable)))
})

 variable (Intercept)            a            b           c          d
1     val1    1.583602 -0.045909060 -0.018189342 0.026293033 0.29708648
2     val2    2.704601 -0.014641683 -0.003836401 0.006711503 0.10445377
3     val3    3.217729 -0.008925782 -0.001863267 0.003475509 0.06292286

If you want all the model not just the coefficients:如果您想要所有模型而不仅仅是系数:

dlply(xx,.(variable),function(x){
  glm.nb(value~.,data=subset(x,select=-variable))
})

Using your loop approach I would simply store all my models in a list like so使用您的循环方法,我只需将所有模型存储在这样的列表中

results <- list()

for (i in c("val1", "val2", "val3")){
  frml <- paste(i, "~ a + b + c + d")
  frml <- as.formula(frml)

  results[[i]] <- glm.nb(frml, data = simple)
}

And then access the models in the list by looking at results$val1 etc.然后通过查看results$val1等访问列表中的模型。

And here is a solution with lapply :这是lapply的解决方案:

summary.list<-lapply(test[, .SD, .SDcols=patterns('val')],
                     function(i) glm.nb(i ~ a + b + c + d, data = test))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM