简体   繁体   English

如何在 dplyr 的汇总中使用循环

[英]How to use a loop in a summarise in dplyr

I am trying to create a large number of aggregate variables with the summarise () function of dplyr.我正在尝试使用 dplyr 的 summarise () function 创建大量聚合变量。 So I thought about using a for loop but it doesn't work.所以我考虑过使用for循环,但它不起作用。 Does anyone have an idea?有人有想法吗?

library(dplyr)
library(rlang)
iris %>% 
  group_by(Species) %>% 
  summarise(
    total_Petal=sum(Petal.Length),
    total_Sepal=sum(Sepal.Length)
  )
)
# Trying the equivalent with a for loop
iris %>% 
  group_by(Species) %>% 
  summarise(
    for (part in c("Petal","Sepal")) {
      !!sym(paste0("total_",part)) := sum(!!sym(paste0(part,".Length")))
    }
  )

Many thanks in advance !提前谢谢了 !

You shouldn't use a for loop in a summarise .您不应该在summarise中使用for循环。 If you need to repeat the same function for multiple columns, the way to go is across .如果您需要对多个列重复相同的 function,则 go 的方式是across . Look at the example down here:看看下面的例子:

library(dplyr)

iris %>% 
 group_by(Species) %>% 
 summarise(across(c("Petal.Length", "Sepal.Length"), sum, .names = "total_{.col}"))

#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#>   Species    total_Petal.Length total_Sepal.Length
#>   <fct>                   <dbl>              <dbl>
#> 1 setosa                   73.1               250.
#> 2 versicolor              213                 297.
#> 3 virginica               278.                329.

Note that with .names you are renaming the variables.请注意,使用.names您正在重命名变量。 Check out the glue package for more info about that.查看glue package 了解更多信息。

Also, as @Konrad pointed out in the comments, the use of strings in across is allowed, however it is better if you write the variables as names (with no apics):此外,正如@Konrad 在评论中指出的那样,允许across cross 中使用字符串,但是最好将变量写为名称(没有 apics):

c(Petal.Length, Sepal.Length)

or this way:或者这样:

all_of(c("Petal.Length", "Sepal.Length")

(as strings, but you are suggesting to dplyr , or rather tidyselect , that those are strings that need to be converted to names) (作为字符串,但您建议dplyr ,或者更确切地说tidyselect ,这些是需要转换为名称的字符串)


Since you looks interested to the columns that ends with ".Length", you can also write it this way:由于您看起来对以“.Length”结尾的列感兴趣,因此您也可以这样写:

iris %>% 
 group_by(Species) %>% 
 summarise(across(ends_with(".Length"), sum, .names = "total_{.col}"))

If you want to remove ".Length" at the end, my suggestion would be to do it in a second function rename_with :如果您想在最后删除“.Length”,我的建议是在第二个 function rename_with中进行:

iris %>% 
 group_by(Species) %>% 
 summarise(across(ends_with(".Length"), sum, .names = "total_{.col}")) %>% 
 rename_with(stringr::str_remove, ends_with("\\.Length$"), pattern = ".Length")
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#>   Species    total_Sepal.Length total_Petal.Length
#>   <fct>                   <dbl>              <dbl>
#> 1 setosa                   250.               73.1
#> 2 versicolor               297.              213  
#> 3 virginica                329.              278. 

I wrote ".Length" this way "\.Length$" in order to specify that the dot should be interpreted as a dot ("\.") and that that pattern is at the very end ("$").我这样写“.Length”“\.Length$”是为了指定点应该被解释为一个点(“\.”),并且该模式位于最后(“$”)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM