[英]summarise and then summarise_at in one dplyr chain?
I have a data frame of clustered data which I'm aggregating by cluster to provide summary data on.我有一个集群数据的数据框,我按集群聚合它以提供摘要数据。
I would like to create a new column based on cluster count n() and then do mean and sum over a list of vars:我想创建一个基于簇数 n() 的新列,然后对 var 列表进行均值和求和:
# works fine
nums <- c("mpg", "disp", "cyl")
mtcars %>% group_by(carb) %>% summarise(cnt = n())
Looks like this:看起来像这样:
# A tibble: 6 x 2
carb cnt
<dbl> <int>
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1
# does not work, returns error message:
> Error in summarise_impl(.data, dots) : Evaluation error: object
> 'disp' not found. In addition: Warning message: In mean.default(mpg) :
> argument is not numeric or logical: returning NA
nums <- c("mpg", "disp", "cyl")
mtcars %>% group_by(carb) %>% summarise(cnt = n()) %>% summarise_at(.vars = nums,
funs(mean, sum))
Goal is to have the tbl above but with new column cnt being the count of observations in each group.目标是获得上面的 tbl,但新列 cnt 是每个组中的观察计数。
We can mutate
to create the 'cn't by 'carb', then add 'cnt' also as the grouping variable before doing the summarise_at
我们可以
mutate
以通过 'carb' 创建 'cn't,然后在执行summarise_at
之前添加 'cnt' 作为分组变量
mtcars %>%
group_by(carb) %>%
mutate(cnt = n()) %>%
group_by(cnt, add = TRUE) %>%
summarise_at(.vars = nums, funs(mean, sum))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.