简体   繁体   中英

How to Loop/Repeat a Linear Regression in R

I have figured out how to make a table in R with 4 variables, which I am using for multiple linear regressions. The dependent variable (Lung) for each regression is taken from one column of a csv table of 22,000 columns. One of the independent variables (Blood) is taken from a corresponding column of a similar table. Each column represents the levels of a particular gene, which is why there are so many of them. There are also two additional variables (Age and Gender of each patient). When I enter in the linear regression equation, I use lm(Lung[,1] ~ Blood[,1] + Age + Gender), which works for one gene. I am looking for a way to input this equation and have R calculate all of the remaining columns for Lung and Blood, and hopefully output the coefficients into a table. Any help would be appreciated!

You want to run 22,000 linear regressions and extract the coefficients? That's simple to do from a coding standpoint.

set.seed(1)

# number of columns in the Lung and Blood data.frames. 22,000 for you?
n <- 5 

# dummy data
obs <- 50 # observations
Lung <- data.frame(matrix(rnorm(obs*n), ncol=n))
Blood <- data.frame(matrix(rnorm(obs*n), ncol=n))
Age <- sample(20:80, obs)
Gender  <- factor(rbinom(obs, 1, .5))

# run n regressions
my_lms <- lapply(1:n, function(x) lm(Lung[,x] ~ Blood[,x] + Age + Gender))

# extract just coefficients
sapply(my_lms, coef)

# if you need more info, get full summary call. now you can get whatever, like:
summaries <- lapply(my_lms, summary)
# ...coefficents with p values:
lapply(summaries, function(x) x$coefficients[, c(1,4)])
# ...or r-squared values
sapply(summaries, function(x) c(r_sq = x$r.squared, 
                                adj_r_sq = x$adj.r.squared))

The models are stored in a list, where model 3 (with DV Lung[, 3] and IVs Blood[,3] + Age + Gender) is in my_lms[[3]] and so on. You can use apply functions on the list to perform summaries, from which you can extract the numbers you want.

The question seems to be about how to call regression functions with formulas which are modified inside a loop.

Here is how you can do it in (using diamonds dataset):

attach(ggplot2::diamonds)
strCols = names(ggplot2::diamonds)

formula <- list(); model <- list()
for (i in 1:1) {
  formula[[i]] = paste0(strCols[7], " ~ ", strCols[7+i])
  model[[i]] = glm(formula[[i]]) 

  #then you can plot or do anything else with the result ...
  png(filename = sprintf("diamonds_price=glm(%s).png", strCols[7+i]))
  par(mfrow = c(2, 2))      
  plot(model[[i]])
  dev.off()
  }

Sensible or not, to make the loop at least somehow work you need:

y<- c(1,5,6,2,5,10) # response 
x1<- c(2,12,8,1,16,17) # predictor 
x2<- c(2,14,5,1,17,17) 
predictorlist<- list("x1","x2") 
for (i in predictorlist){ 
  model <- lm(paste("y ~", i[[1]]), data=df) 
  print(summary(model)) 
} 

The paste function will solve the problem.

A tidyverse addition - with map()

Another way - using map2() from the purrr package:

library(purrr)

xs <- anscombe[,1:3] # Select variables of interest
ys <- anscombe[,5:7]

map2_df(ys, xs,
        function(i,j){
          m <- lm(i ~j + x4 , data = anscombe)
          coef(m)
        })

The output is a dataframe (tibble) of all coefficients:

  `(Intercept)`     j      x4
1          4.33 0.451 -0.0987
2          6.42 0.373 -0.253 
3          2.30 0.526  0.0518

If more variables are changing this can be done using the pmap() functions

Can you help how do I get VIF and confint(confidence interval of co-efficient) in loop. A small correction to above code which is really helpful.

y<- c(1,5,6,2,5,10) # response 
x1<- c(2,12,8,1,16,17) # predictor 
x2<- c(2,14,5,1,17,17) 
df<-cbind(y,x1,x2)
df<-as.data.frame(df)
predictorlist<- list("x1","x2") 
for (i in predictorlist){ 
    model <- lm(paste("y ~", i[[1]]), data=df) 
    print(summary(model)) 
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM