summarise and then summarise_at in one dplyr chain?

Question

I have a data frame of clustered data which I'm aggregating by cluster to provide summary data on.

I would like to create a new column based on cluster count n() and then do mean and sum over a list of vars:

# works fine
nums <- c("mpg", "disp", "cyl")
mtcars %>% group_by(carb) %>% summarise(cnt = n())

Looks like this:

# A tibble: 6 x 2
   carb   cnt
  <dbl> <int>
1     1     7
2     2    10
3     3     3
4     4    10
5     6     1
6     8     1

# does not work, returns error message:

> Error in summarise_impl(.data, dots) :    Evaluation error: object
> 'disp' not found. In addition: Warning message: In mean.default(mpg) :
> argument is not numeric or logical: returning NA

nums <- c("mpg", "disp", "cyl")
mtcars %>% group_by(carb) %>% summarise(cnt = n()) %>% summarise_at(.vars = nums,
                                                                    funs(mean, sum))

Goal is to have the tbl above but with new column cnt being the count of observations in each group.

Answer 1

We can mutate to create the 'cn't by 'carb', then add 'cnt' also as the grouping variable before doing the summarise_at

mtcars %>% 
   group_by(carb) %>% 
   mutate(cnt = n()) %>%
   group_by(cnt, add = TRUE) %>% 
   summarise_at(.vars = nums, funs(mean, sum))

summarise and then summarise_at in one dplyr chain?

Question

1 answers

solution1
1 ACCPTED 2018-10-17 16:52:41

summarise and then summarise_at in one dplyr chain?

Question

1 answers

solution1 1 ACCPTED 2018-10-17 16:52:41

solution1
1 ACCPTED 2018-10-17 16:52:41