简体   繁体   中英

How to use a loop in a summarise in dplyr

I am trying to create a large number of aggregate variables with the summarise () function of dplyr. So I thought about using a for loop but it doesn't work. Does anyone have an idea?

library(dplyr)
library(rlang)
iris %>% 
  group_by(Species) %>% 
  summarise(
    total_Petal=sum(Petal.Length),
    total_Sepal=sum(Sepal.Length)
  )
)
# Trying the equivalent with a for loop
iris %>% 
  group_by(Species) %>% 
  summarise(
    for (part in c("Petal","Sepal")) {
      !!sym(paste0("total_",part)) := sum(!!sym(paste0(part,".Length")))
    }
  )

Many thanks in advance !

You shouldn't use a for loop in a summarise . If you need to repeat the same function for multiple columns, the way to go is across . Look at the example down here:

library(dplyr)

iris %>% 
 group_by(Species) %>% 
 summarise(across(c("Petal.Length", "Sepal.Length"), sum, .names = "total_{.col}"))

#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#>   Species    total_Petal.Length total_Sepal.Length
#>   <fct>                   <dbl>              <dbl>
#> 1 setosa                   73.1               250.
#> 2 versicolor              213                 297.
#> 3 virginica               278.                329.

Note that with .names you are renaming the variables. Check out the glue package for more info about that.

Also, as @Konrad pointed out in the comments, the use of strings in across is allowed, however it is better if you write the variables as names (with no apics):

c(Petal.Length, Sepal.Length)

or this way:

all_of(c("Petal.Length", "Sepal.Length")

(as strings, but you are suggesting to dplyr , or rather tidyselect , that those are strings that need to be converted to names)


Since you looks interested to the columns that ends with ".Length", you can also write it this way:

iris %>% 
 group_by(Species) %>% 
 summarise(across(ends_with(".Length"), sum, .names = "total_{.col}"))

If you want to remove ".Length" at the end, my suggestion would be to do it in a second function rename_with :

iris %>% 
 group_by(Species) %>% 
 summarise(across(ends_with(".Length"), sum, .names = "total_{.col}")) %>% 
 rename_with(stringr::str_remove, ends_with("\\.Length$"), pattern = ".Length")
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#>   Species    total_Sepal.Length total_Petal.Length
#>   <fct>                   <dbl>              <dbl>
#> 1 setosa                   250.               73.1
#> 2 versicolor               297.              213  
#> 3 virginica                329.              278. 

I wrote ".Length" this way "\.Length$" in order to specify that the dot should be interpreted as a dot ("\.") and that that pattern is at the very end ("$").

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM