如何在 dplyr 的汇总中使用循环

Question

我正在尝试使用 dplyr 的 summarise () function 创建大量聚合变量。 所以我考虑过使用for循环，但它不起作用。 有人有想法吗？

library(dplyr)
library(rlang)
iris %>% 
  group_by(Species) %>% 
  summarise(
    total_Petal=sum(Petal.Length),
    total_Sepal=sum(Sepal.Length)
  )
)
# Trying the equivalent with a for loop
iris %>% 
  group_by(Species) %>% 
  summarise(
    for (part in c("Petal","Sepal")) {
      !!sym(paste0("total_",part)) := sum(!!sym(paste0(part,".Length")))
    }
  )

提前谢谢了！

Answer 1

您不应该在summarise中使用for循环。 如果您需要对多个列重复相同的 function，则 go 的方式是across . 看看下面的例子：

library(dplyr)

iris %>% 
 group_by(Species) %>% 
 summarise(across(c("Petal.Length", "Sepal.Length"), sum, .names = "total_{.col}"))

#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#>   Species    total_Petal.Length total_Sepal.Length
#>   <fct>                   <dbl>              <dbl>
#> 1 setosa                   73.1               250.
#> 2 versicolor              213                 297.
#> 3 virginica               278.                329.

请注意，使用.names您正在重命名变量。 查看glue package 了解更多信息。

此外，正如@Konrad 在评论中指出的那样，允许across cross 中使用字符串，但是最好将变量写为名称（没有 apics）：

c(Petal.Length, Sepal.Length)

或者这样：

all_of(c("Petal.Length", "Sepal.Length")

（作为字符串，但您建议dplyr ，或者更确切地说tidyselect ，这些是需要转换为名称的字符串）

由于您看起来对以“.Length”结尾的列感兴趣，因此您也可以这样写：

iris %>% 
 group_by(Species) %>% 
 summarise(across(ends_with(".Length"), sum, .names = "total_{.col}"))

如果您想在最后删除“.Length”，我的建议是在第二个 function rename_with中进行：

iris %>% 
 group_by(Species) %>% 
 summarise(across(ends_with(".Length"), sum, .names = "total_{.col}")) %>% 
 rename_with(stringr::str_remove, ends_with("\\.Length$"), pattern = ".Length")
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#>   Species    total_Sepal.Length total_Petal.Length
#>   <fct>                   <dbl>              <dbl>
#> 1 setosa                   250.               73.1
#> 2 versicolor               297.              213  
#> 3 virginica                329.              278.

我这样写“.Length”“\.Length$”是为了指定点应该被解释为一个点（“\.”），并且该模式位于最后（“$”）。

如何在 dplyr 的汇总中使用循环

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-02-11 14:03:20

如何在 dplyr 的汇总中使用循环

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-02-11 14:03:20

解决方案1
2 已采纳 2021-02-11 14:03:20