简体   繁体   English

如何使用列名向量作为 dplyr::group_by() 的输入?

[英]How to use vector of column names as input into dplyr::group_by()?

I want to create a function based on dplyr that performs certain operations on subsets of data.我想创建一个基于dplyr的函数,该函数对数据子集执行某些操作。 The subsets are defined by values of one or more key columns in the dataset.子集由数据集中一个或多个关键列的值定义。 When only one column is used to identify subsets, my code works fine:当只有一列用于标识子集时,我的代码工作正常:

set.seed(1)
df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5)
)
group_key <- "g1"
aggregate <- function(df, by) {
  df %>% group_by(!!sym(by)) %>% summarize(a = mean(a))
}
aggregate(df, by = group_key)

This works as expected and returns something like this:这按预期工作并返回如下内容:

# A tibble: 2 x 2
     g1     a
  <dbl> <dbl>
1     1   1.5
2     2   4  

Unfortunately everything breaks down if I change group_key :不幸的是,如果我更改group_key一切都会崩溃:

group_key <- c("g1", "g2")
aggregate(df, by = group_key)

I get an error: Only strings can be converted to symbols , which I think comes from rlang::sym() .我收到一个错误: Only strings can be converted to symbols ,我认为它来自rlang::sym() Replacing it with syms() does not work since I get a list of names, on which group_by() chokes.syms()替换它不起作用,因为我得到了一个名称列表, group_by()在该列表上窒息。

Any suggestions would be appreciated!任何建议,将不胜感激!

You need to use the unquote-splice operator !!!你需要使用unquote-splice操作符!!! :

aggregate <- function(df, by) {
  df %>% group_by(!!!syms(by)) %>% summarize(a = mean(a))
}

group_key <- c("g1", "g2")

aggregate(df, by = group_key)
## A tibble: 4 x 3
## Groups:   g1 [2]
#     g1    g2     a
#  <dbl> <dbl> <dbl>
#1     1     1   1  
#2     1     2   4  
#3     2     1   2.5
#4     2     2   5 

Alternatively, you can use dplyr::group_by_at :或者,您可以使用dplyr::group_by_at

agg <- function(df, by) {
  require(dplyr)
  df %>% group_by_at(vars(one_of(by))) %>% summarize(a = mean(a))}

group_key <- "g1"
group_keys <- c("g1","g2")

agg(df, by = group_key)
#> # A tibble: 2 x 2
#>      g1     a
#>   <dbl> <dbl>
#> 1     1  2.5 
#> 2     2  3.33

agg(df, by = group_keys)
#> # A tibble: 4 x 3
#> # Groups:   g1 [2]
#>      g1    g2     a
#>   <dbl> <dbl> <dbl>
#> 1     1     1   1  
#> 2     1     2   4  
#> 3     2     1   2.5
#> 4     2     2   5

Update with dplyr 1.0.0使用 dplyr 1.0.0 更新

The new across() allows tidyselect functions like all_of which replaces the quote-unqote procedure of NSE.新的 cross across()允许像all_of这样的 tidyselect 函数替换 NSE 的 quote-unqote 过程。 The code looks a bit simpler with that:代码看起来更简单一些:

aggregate <- function(df, by) {
  df %>% 
    group_by(across(all_of(by))) %>% 
    summarize(a = mean(a))
}

df %>% aggregate(group_key)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM