[英]How to use a loop in a summarise in dplyr
I am trying to create a large number of aggregate variables with the summarise () function of dplyr.我正在尝试使用 dplyr 的 summarise () function 创建大量聚合变量。 So I thought about using a for loop but it doesn't work.所以我考虑过使用for循环,但它不起作用。 Does anyone have an idea?有人有想法吗?
library(dplyr)
library(rlang)
iris %>%
group_by(Species) %>%
summarise(
total_Petal=sum(Petal.Length),
total_Sepal=sum(Sepal.Length)
)
)
# Trying the equivalent with a for loop
iris %>%
group_by(Species) %>%
summarise(
for (part in c("Petal","Sepal")) {
!!sym(paste0("total_",part)) := sum(!!sym(paste0(part,".Length")))
}
)
Many thanks in advance !提前谢谢了 !
You shouldn't use a for
loop in a summarise
.您不应该在summarise
中使用for
循环。 If you need to repeat the same function for multiple columns, the way to go is across
.如果您需要对多个列重复相同的 function,则 go 的方式是across
. Look at the example down here:看看下面的例子:
library(dplyr)
iris %>%
group_by(Species) %>%
summarise(across(c("Petal.Length", "Sepal.Length"), sum, .names = "total_{.col}"))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#> Species total_Petal.Length total_Sepal.Length
#> <fct> <dbl> <dbl>
#> 1 setosa 73.1 250.
#> 2 versicolor 213 297.
#> 3 virginica 278. 329.
Note that with .names
you are renaming the variables.请注意,使用.names
您正在重命名变量。 Check out the glue
package for more info about that.查看glue
package 了解更多信息。
Also, as @Konrad pointed out in the comments, the use of strings in across
is allowed, however it is better if you write the variables as names (with no apics):此外,正如@Konrad 在评论中指出的那样,允许across
cross 中使用字符串,但是最好将变量写为名称(没有 apics):
c(Petal.Length, Sepal.Length)
or this way:或者这样:
all_of(c("Petal.Length", "Sepal.Length")
(as strings, but you are suggesting to dplyr
, or rather tidyselect
, that those are strings that need to be converted to names) (作为字符串,但您建议dplyr
,或者更确切地说tidyselect
,这些是需要转换为名称的字符串)
Since you looks interested to the columns that ends with ".Length", you can also write it this way:由于您看起来对以“.Length”结尾的列感兴趣,因此您也可以这样写:
iris %>%
group_by(Species) %>%
summarise(across(ends_with(".Length"), sum, .names = "total_{.col}"))
If you want to remove ".Length" at the end, my suggestion would be to do it in a second function rename_with
:如果您想在最后删除“.Length”,我的建议是在第二个 function rename_with
中进行:
iris %>%
group_by(Species) %>%
summarise(across(ends_with(".Length"), sum, .names = "total_{.col}")) %>%
rename_with(stringr::str_remove, ends_with("\\.Length$"), pattern = ".Length")
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#> Species total_Sepal.Length total_Petal.Length
#> <fct> <dbl> <dbl>
#> 1 setosa 250. 73.1
#> 2 versicolor 297. 213
#> 3 virginica 329. 278.
I wrote ".Length" this way "\.Length$" in order to specify that the dot should be interpreted as a dot ("\.") and that that pattern is at the very end ("$").我这样写“.Length”“\.Length$”是为了指定点应该被解释为一个点(“\.”),并且该模式位于最后(“$”)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.