简体   繁体   中英

Using lappy to do loop glm

This is an example of what I'm trying to do.

Step 1 :

Create a list of combination of dependent variable and independent variables

a <- list(paste("Sepal.Length ~  Sepal.Width" ) , 
paste("Sepal.Width ~ Sepal.Length" )
)

Step 2 :

Using lappy function to run glm for each element in the list in the step #1 , and also create a for loop to test two different parameters

param <- c("gaussian" , "Gamma" )
for(i in 1:2) {
print(lapply(a , FUN = function(X) glm(X , data = iris ,family = param[i]    )))}

Is there a better way to achieve this without using for loop in the second step? This is what I have tried but it's not working.

a <- 
list(
paste("Sepal.Length ~  Sepal.Width , data = iris , family = "Gaussian" " ) , 
paste("Sepal.Length ~  Sepal.Width , data = iris , family = "Gamma" " ) ,                  
paste("Sepal.Width ~  Sepal.Length , data = iris , family = "Gaussian" " ) ,
paste("Sepal.Width ~  Sepal.Length , data = iris , family = "Gamma" " ) 
)

lapply(a , FUN = function(X) glm(X))

Your paste does nothing here. Leave it out. Furthermore, the use of strings is also unnecessary here. Leave them out. Same goes for your parameter families: these are functions , no need to quote them.

This already vastly simplifies the code, both in length and conceptually. Now we have this:

models = list(Sepal.Length ~ Sepal.Width, Sepal.Width ~ Sepal.Length)
families = c(gaussian, Gamma)

And we can apply it:

lapply(models,
       function (model) lapply(families,
                               function (family) glm(model, family, iris)))

… which is a nested application. The indentation hints at what belongs together. Since this is a bit odd, we can also use the cartesian product of the different parameters:

params = as.data.frame(t(expand.grid(models, families)))

lapply(params, function (p) glm(formula = p[[1]], data = iris, family = p[[2]]))

The first line is a bit obscure here. expand.grid allows us to create a data frame of all parameter combinations. Here's an example:

> expand.grid(1 : 3, c('a', 'b'))

  Var1 Var2
1    1    a
2    2    a
3    3    a
4    1    b
5    2    b
6    3    b

Unfortunately, this data frame is in the wrong orientation to be used by lapply , because that applies over columns. So we t ranspose it (and convert it to a data.frame again, since t always returns a matrix ).

This piece of code is incredibly useful because it makes writing nested loops via lapply much more readable; unfortunately, it is itself quite unreadable, so we stick it into a function:

combine_parameters = function (...)
    as.data.frame(t(expand.grid(...)))

This allows us to write elegant, readable code:

models = list(Sepal.Length ~ Sepal.Width, Sepal.Width ~ Sepal.Length)
families = c(gaussian, Gamma)
params = combine_parameters(models, families)
lapply(params, function (p) glm(formula = p[[1]], family = p[[2]]), data = iris)

Using lapply:

lapply(c("gaussian", "Gamma"), function(myFamily){
  lapply(c("Sepal.Length ~  Sepal.Width" , 
           "Sepal.Width ~ Sepal.Length"), function(myFormula){
             glm(formula = myFormula, family = myFamily, data = iris)
           })
})

EDIT: As mentioned in @KonradRudolph answer, we can pass formula as a list with a link = argument, eg:

lapply(list(gaussian(link = "identity"), Gamma), function(myFamily){
  lapply(c("Sepal.Length ~  Sepal.Width" , 
           "Sepal.Width ~ Sepal.Length"), function(myFormula){
             glm(formula = myFormula, family = myFamily, data = iris)
           })
})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM