[英]R- iterating over variables names using a loop or function
我想使用for循環或R中的函數來循環遍歷數據幀中的變量。我編寫了以下代碼(無效):
y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")
mydata <- data.frame(y,var1,var2)
myfunction <- function(v){
regressionresult <- lm(y ~ v, data = mydata)
summary(regressionresult)
}
myfunction("var1")
當我嘗試運行此命令時,出現錯誤消息:
Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) : variable lengths differ (found for 'v')
我認為這不是數據問題,而是有關如何引用變量名稱的問題,因為以下代碼會產生所需的回歸結果(對於我想循環的一個變量):
regressionresult <- lm(y ~ var1, data = mydata) summary(regressionresult)
如何修復函數或將變量名稱放入循環中?
[我也試圖遍歷變量名,但是與函數有類似的問題:
for(v in c("var1","var2")){
regressionresult <- lm(y ~ v, data = mydata)
summary(regressionresult)
}
運行此循環時,將產生錯誤:
Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) :
variable lengths differ (found for 'v')
謝謝你的幫助!
我們可以使用paste
創建公式以將其傳遞給lm
myfunction <- function(v){
regressionresult <- lm(paste0('y ~', v), data = mydata)
summary(regressionresult)
}
out1 <- myfunction("var1")
或使用glue::glue
myfunction <- function(v){
regressionresult <- lm(glue::glue('y ~ {v}'), data = mydata)
summary(regressionresult)
}
myfunction("var1")
您可以在tidyverse
使用函數來處理整齊的數據並將模型應用於不同的公式。
y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")
library(tidyverse)
mydata <- data_frame(y,var1,var2)
res <- mydata %>%
# get data in long format - tidy format
gather("var_type", "value", -y) %>%
# we want one model per var_type
nest(-var_type) %>%
# apply lm on each data
mutate(
regressionresult = map(data, ~lm(y ~ value, data = .x))
)
res
#> # A tibble: 2 x 3
#> var_type data regressionresult
#> <chr> <list> <list>
#> 1 var1 <tibble [10 x 2]> <S3: lm>
#> 2 var2 <tibble [10 x 2]> <S3: lm>
summary(res$regressionresult[[1]])
#>
#> Call:
#> lm(formula = y ~ value, data = .x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.7500 -0.3333 0.2500 0.3125 0.6667
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.3333 0.3150 1.058 0.325
#> valueb 0.3333 0.4454 0.748 0.479
#> valuec 0.4167 0.4167 1.000 0.351
#>
#> Residual standard error: 0.5455 on 7 degrees of freedom
#> Multiple R-squared: 0.1319, Adjusted R-squared: -0.1161
#> F-statistic: 0.532 on 2 and 7 DF, p-value: 0.6094
掃帚套件可以幫助您處理結果
library(broom)
#> Warning: le package 'broom' a été compilé avec la version R 3.4.4
res <- res %>%
mutate(tidy_summary = map(regressionresult, broom::tidy))
res
#> # A tibble: 2 x 4
#> var_type data regressionresult tidy_summary
#> <chr> <list> <list> <list>
#> 1 var1 <tibble [10 x 2]> <S3: lm> <data.frame [3 x 5]>
#> 2 var2 <tibble [10 x 2]> <S3: lm> <data.frame [3 x 5]>
您可以獲取摘要之一
res$tidy_summary[[1]]
#> term estimate std.error statistic p.value
#> 1 (Intercept) 0.3333333 0.3149704 1.0583005 0.3250657
#> 2 valueb 0.3333333 0.4454354 0.7483315 0.4786436
#> 3 valuec 0.4166667 0.4166667 1.0000000 0.3506167
或不必要地獲取可使用的data.frame。
res %>%
unnest(tidy_summary)
#> # A tibble: 6 x 6
#> var_type term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 var1 (Intercept) 0.333 0.315 1.06 0.325
#> 2 var1 valueb 0.333 0.445 0.748 0.479
#> 3 var1 valuec 0.417 0.417 1.000 0.351
#> 4 var2 (Intercept) 0.333 0.315 1.06 0.325
#> 5 var2 valuen 0.417 0.417 1 0.351
#> 6 var2 valueo 0.333 0.445 0.748 0.479
感興趣的函數是nest
和unnest
從[ tidyr
] [ http://tidyr.tidyverse.org/ ),允許容易地創建列表列, map
從purrr
允許遍歷列表和應用功能(這里lm
)和broom
軟件包中的數據tidy
,提供對模型結果進行整齊的功能(匯總結果,預測結果等)
此處未使用,但知道modelr
軟件包有助於在建模時進行管道。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.