I want to loop over variables within a data frame either using a for loop or function in R. I have coded the following (which doesn't work):
y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")
mydata <- data.frame(y,var1,var2)
myfunction <- function(v){
regressionresult <- lm(y ~ v, data = mydata)
summary(regressionresult)
}
myfunction("var1")
When I try running this, I get the error message:
Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) : variable lengths differ (found for 'v')
I don't think this is a problem with the data, but with how I refer to the variable name because the following code produces the desired regression results (for one variable that I wanted to loop over):
regressionresult <- lm(y ~ var1, data = mydata) summary(regressionresult)
How can I fix the function, or put the variables names in the loop?
[I also tried to loop over the variables names, but had a similar problem as with the function:
for(v in c("var1","var2")){
regressionresult <- lm(y ~ v, data = mydata)
summary(regressionresult)
}
When running this loop, it produces the error:
Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) :
variable lengths differ (found for 'v')
Thanks for your help!
We can use paste
to create the formula to pass it on the lm
myfunction <- function(v){
regressionresult <- lm(paste0('y ~', v), data = mydata)
summary(regressionresult)
}
out1 <- myfunction("var1")
Or use glue::glue
myfunction <- function(v){
regressionresult <- lm(glue::glue('y ~ {v}'), data = mydata)
summary(regressionresult)
}
myfunction("var1")
You can use functions in the tidyverse
to work with tidy data and applying model to different formulas.
y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")
library(tidyverse)
mydata <- data_frame(y,var1,var2)
res <- mydata %>%
# get data in long format - tidy format
gather("var_type", "value", -y) %>%
# we want one model per var_type
nest(-var_type) %>%
# apply lm on each data
mutate(
regressionresult = map(data, ~lm(y ~ value, data = .x))
)
res
#> # A tibble: 2 x 3
#> var_type data regressionresult
#> <chr> <list> <list>
#> 1 var1 <tibble [10 x 2]> <S3: lm>
#> 2 var2 <tibble [10 x 2]> <S3: lm>
summary(res$regressionresult[[1]])
#>
#> Call:
#> lm(formula = y ~ value, data = .x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.7500 -0.3333 0.2500 0.3125 0.6667
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.3333 0.3150 1.058 0.325
#> valueb 0.3333 0.4454 0.748 0.479
#> valuec 0.4167 0.4167 1.000 0.351
#>
#> Residual standard error: 0.5455 on 7 degrees of freedom
#> Multiple R-squared: 0.1319, Adjusted R-squared: -0.1161
#> F-statistic: 0.532 on 2 and 7 DF, p-value: 0.6094
Broom package can help you work with the result then
library(broom)
#> Warning: le package 'broom' a été compilé avec la version R 3.4.4
res <- res %>%
mutate(tidy_summary = map(regressionresult, broom::tidy))
res
#> # A tibble: 2 x 4
#> var_type data regressionresult tidy_summary
#> <chr> <list> <list> <list>
#> 1 var1 <tibble [10 x 2]> <S3: lm> <data.frame [3 x 5]>
#> 2 var2 <tibble [10 x 2]> <S3: lm> <data.frame [3 x 5]>
You can get one of the summary
res$tidy_summary[[1]]
#> term estimate std.error statistic p.value
#> 1 (Intercept) 0.3333333 0.3149704 1.0583005 0.3250657
#> 2 valueb 0.3333333 0.4454354 0.7483315 0.4786436
#> 3 valuec 0.4166667 0.4166667 1.0000000 0.3506167
or unnest to get a data.frame to work with.
res %>%
unnest(tidy_summary)
#> # A tibble: 6 x 6
#> var_type term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 var1 (Intercept) 0.333 0.315 1.06 0.325
#> 2 var1 valueb 0.333 0.445 0.748 0.479
#> 3 var1 valuec 0.417 0.417 1.000 0.351
#> 4 var2 (Intercept) 0.333 0.315 1.06 0.325
#> 5 var2 valuen 0.417 0.417 1 0.351
#> 6 var2 valueo 0.333 0.445 0.748 0.479
Functions of interest are nest
and unnest
from [ tidyr
][ http://tidyr.tidyverse.org/ ) that allow to create list columns easily, map
from purrr
that allows to iterate over a list and apply a function (here lm
) and tidy
from broom
package that offers functions to tidy results from models (summary results, predict results, ...)
Not used here but know that modelr
package helps for doing pipelines when modeling.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.