[英]How can I group_by and summarize using a list of column names?
Basically, I want to loop through, group the data by the columns in "list.group", and then create summary statistics for every column in "list.avg", "list.max", and "list.min" so that the columns are mpg_avg, wt_avg, hp_avg, mpg_max, hp_max... mpg_min, hp_min, etc. 基本上,我想循环遍历,按“list.group”中的列对数据进行分组,然后为“list.avg”,“list.max”和“list.min”中的每一列创建汇总统计信息,以便列是mpg_avg,wt_avg,hp_avg,mpg_max,hp_max ... mpg_min,hp_min等。
data("mtcars")
list.avg <- list("mpg","wt","hp")
list.max <- list("mpg","hp","wt","qsec")
list.min <- list("mpg","hp","wt","qsec")
list.group <- list("cyl","vs","am","gear","carb")
So I should have a separate table for each column in list.group. 所以我应该为list.group中的每一列都有一个单独的表。
First, it's helpful to have all the avg/max/min variables in a single list. 首先,将所有avg / max / min变量放在一个列表中会很有帮助。
to_summarise <-
list(mean = c("mpg","wt","hp"),
max = c("mpg","hp","wt","qsec"),
min = c("mpg","hp","wt","qsec"))
Now we can map
over list.group
, and within each list.group
value, imap
over to_summarise
and then merge
all the results together. 现在我们可以
map
list.group
,并在每个list.group
值内, imap
到to_summarise
,然后merge
所有结果merge
在一起。
library(tidyverse)
map(list.group, ~{
grouped <-
mtcars %>%
group_by_at(.x)
out <-
imap(to_summarise, ~{
grouped %>%
summarise_at(.x, setNames(list(get(.y)), .y))
})
out %>%
reduce(merge, by = .x)
})
Output 产量
# [[1]]
# cyl mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 4 26.66364 2.285727 82.63636 33.9 113 3.190 22.90 21.4 52 1.513
# 2 6 19.74286 3.117143 122.28571 21.4 175 3.460 20.22 17.8 105 2.620
# 3 8 15.10000 3.999214 209.21429 19.2 335 5.424 18.00 10.4 150 3.170
# qsec_min
# 1 16.7
# 2 15.5
# 3 14.5
#
# [[2]]
# vs mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 0 16.61667 3.688556 189.72222 26.0 335 5.424 18.0 10.4 91 2.140
# 2 1 24.55714 2.611286 91.35714 33.9 123 3.460 22.9 17.8 52 1.513
# qsec_min
# 1 14.5
# 2 16.9
#
# [[3]]
# am mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 0 17.14737 3.768895 160.2632 24.4 245 5.424 22.9 10.4 62 2.465
# 2 1 24.39231 2.411000 126.8462 33.9 335 3.570 19.9 15.0 52 1.513
# qsec_min
# 1 15.41
# 2 14.50
#
# [[4]]
# gear mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 3 16.10667 3.892600 176.1333 21.5 245 5.424 20.22 10.4 97 2.465
# 2 4 24.53333 2.616667 89.5000 33.9 123 3.440 22.90 17.8 52 1.615
# 3 5 21.38000 2.632600 195.6000 30.4 335 3.570 16.90 15.0 91 1.513
# qsec_min
# 1 15.41
# 2 16.46
# 3 14.50
#
# [[5]]
# carb mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 1 25.34286 2.4900 86.0 33.9 110 3.460 20.22 18.1 65 1.835
# 2 2 22.40000 2.8628 117.2 30.4 175 3.845 22.90 15.2 52 1.513
# 3 3 16.30000 3.8600 180.0 17.3 180 4.070 18.00 15.2 180 3.730
# 4 4 15.79000 3.8974 187.0 21.0 264 5.424 18.90 10.4 110 2.620
# 5 6 19.70000 2.7700 175.0 19.7 175 2.770 15.50 19.7 175 2.770
# 6 8 15.00000 3.5700 335.0 15.0 335 3.570 14.60 15.0 335 3.570
# qsec_min
# 1 18.61
# 2 16.70
# 3 17.40
# 4 14.50
# 5 15.50
# 6 14.60
The 'avg' is not a function in R
. 'avg'不是
R
的函数。 Instead, it can be mean
. 相反,它可能是
mean
。 So, changing the object identifier name from list.avg
to list.mean
, keep the list.
因此,将对象标识符名称从
list.avg
为list.mean
,保留list.
objects into a list
, then loop through the named
list
with imap
, remove the prefix list.
将对象放入
list
,然后使用imap
循环遍历named
list
,删除前缀list.
with str_remove
, using group_by_at
group by the common grouping elements, then summarise_at
the values that we loop while applying the function we get
from the prefix removed names on those columns 与
str_remove
,使用group_by_at
组由共同的分组元素,那么summarise_at
的价值观,我们循环,同时将我们的功能get
从那些列前缀去掉名字
library(tidyverse)
list.mean <- list("mpg","wt","hp")
lst(list.mean, list.max, list.min) %>%
imap(~ {
func <- str_remove(.y, '^list\\.')
vars1 <- unlist(.x)
mtcars %>%
group_by_at(unlist(list.group)) %>%
summarise_at(vars(vars1), ~ get(func)(.))
})
Use map
to loop through list.group
, use group_by_at
to group at each element of list.group
as they are strings then summarize at the required columns and finally binds all together. 使用
map
循环遍历list.group
,使用group_by_at
对list.group
每个元素进行list.group
因为它们是字符串,然后在所需的列中汇总,最后将所有列绑定在一起。
library(purrr)
library(dplyr)
map(list.group, ~mtcars %>%
#.x will be "cyl", "vs" ... etc
group_by_at(.x) %>%
{bind_cols(summarise_at(.,unlist(list.avg), list(avg=mean)),
summarise_at(.,unlist(list.min), list(min=min)),
summarise_at(.,unlist(list.max), list(max=max))
)
}
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.