简体   繁体   English

如何使用列名列表进行group_by和汇总?

[英]How can I group_by and summarize using a list of column names?

Basically, I want to loop through, group the data by the columns in "list.group", and then create summary statistics for every column in "list.avg", "list.max", and "list.min" so that the columns are mpg_avg, wt_avg, hp_avg, mpg_max, hp_max... mpg_min, hp_min, etc. 基本上,我想循环遍历,按“list.group”中的列对数据进行分组,然后为“list.avg”,“list.max”和“list.min”中的每一列创建汇总统计信息,以便列是mpg_avg,wt_avg,hp_avg,mpg_max,hp_max ... mpg_min,hp_min等。

data("mtcars")
    list.avg <- list("mpg","wt","hp")
    list.max <- list("mpg","hp","wt","qsec")
    list.min <- list("mpg","hp","wt","qsec")
    list.group <- list("cyl","vs","am","gear","carb")

So I should have a separate table for each column in list.group. 所以我应该为list.group中的每一列都有一个单独的表。

First, it's helpful to have all the avg/max/min variables in a single list. 首先,将所有avg / max / min变量放在一个列表中会很有帮助。

to_summarise <- 
  list(mean = c("mpg","wt","hp"),
       max = c("mpg","hp","wt","qsec"),
       min = c("mpg","hp","wt","qsec"))

Now we can map over list.group , and within each list.group value, imap over to_summarise and then merge all the results together. 现在我们可以map list.group ,并在每个list.group值内, imapto_summarise ,然后merge所有结果merge在一起。

library(tidyverse)

map(list.group, ~{
  grouped <- 
    mtcars %>% 
      group_by_at(.x) 
  out <- 
    imap(to_summarise, ~{
            grouped %>% 
              summarise_at(.x, setNames(list(get(.y)), .y))
    })
  out %>% 
    reduce(merge, by = .x)
})

Output 产量

#     [[1]]
#   cyl mpg_mean  wt_mean   hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1   4 26.66364 2.285727  82.63636    33.9    113  3.190    22.90    21.4     52  1.513
# 2   6 19.74286 3.117143 122.28571    21.4    175  3.460    20.22    17.8    105  2.620
# 3   8 15.10000 3.999214 209.21429    19.2    335  5.424    18.00    10.4    150  3.170
#   qsec_min
# 1     16.7
# 2     15.5
# 3     14.5
# 
# [[2]]
#   vs mpg_mean  wt_mean   hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1  0 16.61667 3.688556 189.72222    26.0    335  5.424     18.0    10.4     91  2.140
# 2  1 24.55714 2.611286  91.35714    33.9    123  3.460     22.9    17.8     52  1.513
#   qsec_min
# 1     14.5
# 2     16.9
# 
# [[3]]
#   am mpg_mean  wt_mean  hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1  0 17.14737 3.768895 160.2632    24.4    245  5.424     22.9    10.4     62  2.465
# 2  1 24.39231 2.411000 126.8462    33.9    335  3.570     19.9    15.0     52  1.513
#   qsec_min
# 1    15.41
# 2    14.50
# 
# [[4]]
#   gear mpg_mean  wt_mean  hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1    3 16.10667 3.892600 176.1333    21.5    245  5.424    20.22    10.4     97  2.465
# 2    4 24.53333 2.616667  89.5000    33.9    123  3.440    22.90    17.8     52  1.615
# 3    5 21.38000 2.632600 195.6000    30.4    335  3.570    16.90    15.0     91  1.513
#   qsec_min
# 1    15.41
# 2    16.46
# 3    14.50
# 
# [[5]]
#   carb mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1    1 25.34286  2.4900    86.0    33.9    110  3.460    20.22    18.1     65  1.835
# 2    2 22.40000  2.8628   117.2    30.4    175  3.845    22.90    15.2     52  1.513
# 3    3 16.30000  3.8600   180.0    17.3    180  4.070    18.00    15.2    180  3.730
# 4    4 15.79000  3.8974   187.0    21.0    264  5.424    18.90    10.4    110  2.620
# 5    6 19.70000  2.7700   175.0    19.7    175  2.770    15.50    19.7    175  2.770
# 6    8 15.00000  3.5700   335.0    15.0    335  3.570    14.60    15.0    335  3.570
#   qsec_min
# 1    18.61
# 2    16.70
# 3    17.40
# 4    14.50
# 5    15.50
# 6    14.60

The 'avg' is not a function in R . 'avg'不是R的函数。 Instead, it can be mean . 相反,它可能是mean So, changing the object identifier name from list.avg to list.mean , keep the list. 因此,将对象标识符名称从list.avglist.mean ,保留list. objects into a list , then loop through the named list with imap , remove the prefix list. 将对象放入list ,然后使用imap循环遍历named list ,删除前缀list. with str_remove , using group_by_at group by the common grouping elements, then summarise_at the values that we loop while applying the function we get from the prefix removed names on those columns str_remove ,使用group_by_at组由共同的分组元素,那么summarise_at的价值观,我们循环,同时将我们的功能get从那些列前缀去掉名字

library(tidyverse)
list.mean <- list("mpg","wt","hp")
lst(list.mean, list.max, list.min) %>% 
   imap(~ {

   func <- str_remove(.y, '^list\\.')
    vars1 <- unlist(.x)



  mtcars %>%
     group_by_at(unlist(list.group)) %>%
      summarise_at(vars(vars1), ~ get(func)(.))


  })

Use map to loop through list.group , use group_by_at to group at each element of list.group as they are strings then summarize at the required columns and finally binds all together. 使用map循环遍历list.group ,使用group_by_atlist.group每个元素进行list.group因为它们是字符串,然后在所需的列中汇总,最后将所有列绑定在一起。

library(purrr)
library(dplyr)
map(list.group, ~mtcars %>% 
          #.x will be "cyl", "vs" ... etc 
          group_by_at(.x) %>% 
          {bind_cols(summarise_at(.,unlist(list.avg), list(avg=mean)),
                     summarise_at(.,unlist(list.min), list(min=min)),
                     summarise_at(.,unlist(list.max), list(max=max))
                     )
          }
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM