简体   繁体   中英

group_by summarise by name prefix

My actual dataset is a bit more complex than the dummy data below. I want to tell R to summarise by sum any variable beginning with the prefix "cat_". Right now I'm doing it individually. Any suggestions?

dput(df)
structure(list(ID = c("A", "B", "C", "D", "A", "B", "C", "D", 
"A", "B", "C", "D"), year = c(1900, 1900, 1900, 1900, 1901, 1901, 
1901, 1901, 1902, 1902, 1902, 1902), val = c(2635L, 8573L, 5942L, 
7390L, 8762L, 7871L, 7848L, 1928L, 6772L, 6487L, 6005L, 5341L
), cat_TS = c(1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    cat_1 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), 
    cat_2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

df <- df %>% group_by(ID) %>% 
  summarise(cat_TS = sum(cat_TS), cat_1 = sum(cat_1), cat_2 = sum(cat_2))

Use dplyr::starts_with to select columns that started with name 'cat' in the dplyr::across to do sum on all of those columns in summarise .

library(dplyr)

df %>% group_by(ID) %>%
  summarise(
    across(starts_with("cat"), sum)
  )

# # A tibble: 4 × 4
#   ID    cat_TS cat_1 cat_2
#   <chr>  <int> <int> <int>
# 1 A          1     1     1
# 2 B          0     1     0
# 3 C          0     0     0
# 4 D          1     0     0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM