简体   繁体   中英

summarise and then summarise_at in one dplyr chain?

I have a data frame of clustered data which I'm aggregating by cluster to provide summary data on.

I would like to create a new column based on cluster count n() and then do mean and sum over a list of vars:

# works fine
nums <- c("mpg", "disp", "cyl")
mtcars %>% group_by(carb) %>% summarise(cnt = n())

Looks like this:

# A tibble: 6 x 2
   carb   cnt
  <dbl> <int>
1     1     7
2     2    10
3     3     3
4     4    10
5     6     1
6     8     1

# does not work, returns error message:

> Error in summarise_impl(.data, dots) :    Evaluation error: object
> 'disp' not found. In addition: Warning message: In mean.default(mpg) :
> argument is not numeric or logical: returning NA

nums <- c("mpg", "disp", "cyl")
mtcars %>% group_by(carb) %>% summarise(cnt = n()) %>% summarise_at(.vars = nums,
                                                                    funs(mean, sum))

Goal is to have the tbl above but with new column cnt being the count of observations in each group.

We can mutate to create the 'cn't by 'carb', then add 'cnt' also as the grouping variable before doing the summarise_at

mtcars %>% 
   group_by(carb) %>% 
   mutate(cnt = n()) %>%
   group_by(cnt, add = TRUE) %>% 
   summarise_at(.vars = nums, funs(mean, sum))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM