I have a data frame of clustered data which I'm aggregating by cluster to provide summary data on.
I would like to create a new column based on cluster count n() and then do mean and sum over a list of vars:
# works fine
nums <- c("mpg", "disp", "cyl")
mtcars %>% group_by(carb) %>% summarise(cnt = n())
Looks like this:
# A tibble: 6 x 2
carb cnt
<dbl> <int>
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1
# does not work, returns error message:
> Error in summarise_impl(.data, dots) : Evaluation error: object
> 'disp' not found. In addition: Warning message: In mean.default(mpg) :
> argument is not numeric or logical: returning NA
nums <- c("mpg", "disp", "cyl")
mtcars %>% group_by(carb) %>% summarise(cnt = n()) %>% summarise_at(.vars = nums,
funs(mean, sum))
Goal is to have the tbl above but with new column cnt being the count of observations in each group.
We can mutate
to create the 'cn't by 'carb', then add 'cnt' also as the grouping variable before doing the summarise_at
mtcars %>%
group_by(carb) %>%
mutate(cnt = n()) %>%
group_by(cnt, add = TRUE) %>%
summarise_at(.vars = nums, funs(mean, sum))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.