I am trying to create a large number of aggregate variables with the summarise () function of dplyr. So I thought about using a for loop but it doesn't work. Does anyone have an idea?
library(dplyr)
library(rlang)
iris %>%
group_by(Species) %>%
summarise(
total_Petal=sum(Petal.Length),
total_Sepal=sum(Sepal.Length)
)
)
# Trying the equivalent with a for loop
iris %>%
group_by(Species) %>%
summarise(
for (part in c("Petal","Sepal")) {
!!sym(paste0("total_",part)) := sum(!!sym(paste0(part,".Length")))
}
)
Many thanks in advance !
You shouldn't use a for
loop in a summarise
. If you need to repeat the same function for multiple columns, the way to go is across
. Look at the example down here:
library(dplyr)
iris %>%
group_by(Species) %>%
summarise(across(c("Petal.Length", "Sepal.Length"), sum, .names = "total_{.col}"))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#> Species total_Petal.Length total_Sepal.Length
#> <fct> <dbl> <dbl>
#> 1 setosa 73.1 250.
#> 2 versicolor 213 297.
#> 3 virginica 278. 329.
Note that with .names
you are renaming the variables. Check out the glue
package for more info about that.
Also, as @Konrad pointed out in the comments, the use of strings in across
is allowed, however it is better if you write the variables as names (with no apics):
c(Petal.Length, Sepal.Length)
or this way:
all_of(c("Petal.Length", "Sepal.Length")
(as strings, but you are suggesting to dplyr
, or rather tidyselect
, that those are strings that need to be converted to names)
Since you looks interested to the columns that ends with ".Length", you can also write it this way:
iris %>%
group_by(Species) %>%
summarise(across(ends_with(".Length"), sum, .names = "total_{.col}"))
If you want to remove ".Length" at the end, my suggestion would be to do it in a second function rename_with
:
iris %>%
group_by(Species) %>%
summarise(across(ends_with(".Length"), sum, .names = "total_{.col}")) %>%
rename_with(stringr::str_remove, ends_with("\\.Length$"), pattern = ".Length")
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#> Species total_Sepal.Length total_Petal.Length
#> <fct> <dbl> <dbl>
#> 1 setosa 250. 73.1
#> 2 versicolor 297. 213
#> 3 virginica 329. 278.
I wrote ".Length" this way "\.Length$" in order to specify that the dot should be interpreted as a dot ("\.") and that that pattern is at the very end ("$").
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.