简体   繁体   中英

R- iterating over variables names using a loop or function

I want to loop over variables within a data frame either using a for loop or function in R. I have coded the following (which doesn't work):

y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")

mydata <- data.frame(y,var1,var2)

myfunction <- function(v){
  regressionresult <- lm(y ~ v, data = mydata)
  summary(regressionresult) 
}
myfunction("var1")

When I try running this, I get the error message:

Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) : variable lengths differ (found for 'v')

I don't think this is a problem with the data, but with how I refer to the variable name because the following code produces the desired regression results (for one variable that I wanted to loop over):

regressionresult <- lm(y ~ var1, data = mydata) summary(regressionresult)

How can I fix the function, or put the variables names in the loop?

[I also tried to loop over the variables names, but had a similar problem as with the function:

for(v in c("var1","var2")){
  regressionresult <- lm(y ~ v, data = mydata)
  summary(regressionresult)  
}

When running this loop, it produces the error:

Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'v')

Thanks for your help!

We can use paste to create the formula to pass it on the lm

myfunction <- function(v){
  regressionresult <- lm(paste0('y ~', v), data = mydata)
  summary(regressionresult) 
}
out1 <- myfunction("var1")

Or use glue::glue

myfunction <- function(v){
  regressionresult <- lm(glue::glue('y ~ {v}'), data = mydata)
  summary(regressionresult) 
 }
myfunction("var1")

You can use functions in the tidyverse to work with tidy data and applying model to different formulas.

y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")

library(tidyverse)
mydata <- data_frame(y,var1,var2)

res <- mydata %>%
  # get data in long format - tidy format
  gather("var_type", "value", -y) %>%
  # we want one model per var_type
  nest(-var_type) %>%
  # apply lm on each data
  mutate(
    regressionresult = map(data, ~lm(y ~ value, data = .x))
  )
res
#> # A tibble: 2 x 3
#>   var_type data              regressionresult
#>   <chr>    <list>            <list>          
#> 1 var1     <tibble [10 x 2]> <S3: lm>        
#> 2 var2     <tibble [10 x 2]> <S3: lm>
summary(res$regressionresult[[1]])
#> 
#> Call:
#> lm(formula = y ~ value, data = .x)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -0.7500 -0.3333  0.2500  0.3125  0.6667 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)   0.3333     0.3150   1.058    0.325
#> valueb        0.3333     0.4454   0.748    0.479
#> valuec        0.4167     0.4167   1.000    0.351
#> 
#> Residual standard error: 0.5455 on 7 degrees of freedom
#> Multiple R-squared:  0.1319, Adjusted R-squared:  -0.1161 
#> F-statistic: 0.532 on 2 and 7 DF,  p-value: 0.6094

Broom package can help you work with the result then

library(broom)
#> Warning: le package 'broom' a été compilé avec la version R 3.4.4
res <- res %>%
  mutate(tidy_summary = map(regressionresult, broom::tidy))
res         
#> # A tibble: 2 x 4
#>   var_type data              regressionresult tidy_summary        
#>   <chr>    <list>            <list>           <list>              
#> 1 var1     <tibble [10 x 2]> <S3: lm>         <data.frame [3 x 5]>
#> 2 var2     <tibble [10 x 2]> <S3: lm>         <data.frame [3 x 5]>

You can get one of the summary

res$tidy_summary[[1]]
#>          term  estimate std.error statistic   p.value
#> 1 (Intercept) 0.3333333 0.3149704 1.0583005 0.3250657
#> 2      valueb 0.3333333 0.4454354 0.7483315 0.4786436
#> 3      valuec 0.4166667 0.4166667 1.0000000 0.3506167

or unnest to get a data.frame to work with.

res %>% 
  unnest(tidy_summary)
#> # A tibble: 6 x 6
#>   var_type term        estimate std.error statistic p.value
#>   <chr>    <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 var1     (Intercept)    0.333     0.315     1.06    0.325
#> 2 var1     valueb         0.333     0.445     0.748   0.479
#> 3 var1     valuec         0.417     0.417     1.000   0.351
#> 4 var2     (Intercept)    0.333     0.315     1.06    0.325
#> 5 var2     valuen         0.417     0.417     1       0.351
#> 6 var2     valueo         0.333     0.445     0.748   0.479

Functions of interest are nest and unnest from [ tidyr ][ http://tidyr.tidyverse.org/ ) that allow to create list columns easily, map from purrr that allows to iterate over a list and apply a function (here lm ) and tidy from broom package that offers functions to tidy results from models (summary results, predict results, ...)

Not used here but know that modelr package helps for doing pipelines when modeling.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM