简体   繁体   中英

Better way to summarize multiple groups in same dataframe

I'm not sure a better way to phrase this for the title, which is probably impeding me being able to search for the answer.

I have a dataframe that looks like this:

example_df <- data.frame(
  ID = c('A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'),
  location = c('park 1', 'park 1', 'park 2', 'park 3', 'park 1', 'park 4', 'park 1', 'park 5'),
  sample_2000 = c(1, 5, 0, 2, 3, 1, 0, 8), 
  sample_2001 = c(2, 1, 1, 3, 5, 6, 4, 2), 
  sample_2003 = c(1, 2, 5, 8, 11, 1, 0, 7)
  )

  ID location sample_2000 sample_2001 sample_2003
1  A   park 1           1           2           1
2  A   park 1           5           1           2
3  A   park 2           0           1           5
4  B   park 3           2           3           8
5  B   park 1           3           5          11
6  C   park 4           1           6           1
7  C   park 1           0           4           0
8  C   park 5           8           2           7

I want to sum all the values for each year by location and end up with the results in the same dataframe. I'm currently using group_by() and summarize on each year individually and then joining everything back together:

library(dplyr)

summarize1 <- group_by(example_df, location) %>% dplyr::summarize(sample_2000 = sum(sample_2000))
summarize2 <- group_by(example_df, location) %>% dplyr::summarize(sample_2001 = sum(sample_2001))
summarize3 <- group_by(example_df, location) %>% dplyr::summarize(sample_2003 = sum(sample_2003))

all_summarized <- Reduce(function(x, y) merge(x, y, all=TRUE), list(summarize1, summarize2, summarize3))

Desired output (which I receive from the above) looks like this:


  location sample_2000 sample_2001 sample_2003
1   park 1           9          12          14
2   park 2           0           1           5
3   park 3           2           3           8
4   park 4           1           6           1
5   park 5           8           2           7

Surely there's a better method. My attempt at a for-loop returns the following:

'Error in sum(paste0("sample_", i)): invalid 'type' (character) of argument'


year_list <- c(2000, 2001, 2003)

for (i in year_list) {

  test <- group_by(example_df, location) %>% dplyr::summarize(paste0("sample_", i)) = sum(paste0("sample_", i))

}

Thank you!

If we want to use a similar approach to Reduce/merge , then we can make use of map/reduce from purrr

library(dplyr)
library(purrr)
map(names(example_df)[3:5], ~  
   example_df %>% 
        select(location, .x) %>%
        group_by(location) %>% 
        summarise_at(vars(starts_with('sample')), sum)) %>% 
   reduce(full_join)

Or with summarise/across (in the new version of dplyr ), we can get the same output (though not sure if the example is for a general case or something related to sum only)

example_df %>%
      group_by(location) %>% 
      summarise(across(starts_with('sample'), sum))

Or with summarise_at from stable version of dplyr (could be deprecated in the future)

example_df %>%
    group_by(location) %>%
    summarise_at(vars(starts_with('sample')), sum)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM