dplyr：如何在函數內部使用 group_by？

Question

我想在另一個函數中使用dplyr::group_by函數，但我不知道如何將參數傳遞給這個函數。

有人可以提供一個工作示例嗎？

library(dplyr)
data(iris)
iris %.% group_by(Species) %.% summarise(n = n()) # 
## Source: local data frame [3 x 2]
##      Species  n
## 1  virginica 50
## 2 versicolor 50
## 3     setosa 50

mytable0 <- function(x, ...) x %.% group_by(...) %.% summarise(n = n())
mytable0(iris, "Species") # OK
## Source: local data frame [3 x 2]
##      Species  n
## 1  virginica 50
## 2 versicolor 50
## 3     setosa 50

mytable1 <- function(x, key) x %.% group_by(as.name(key)) %.% summarise(n = n())
mytable1(iris, "Species") # Wrong!
# Error: unsupported type for column 'as.name(key)' (SYMSXP)

mytable2 <- function(x, key) x %.% group_by(key) %.% summarise(n = n())
mytable2(iris, "Species") # Wrong!
# Error: index out of bounds

Answer 1

對於編程， group_by_是對口group_by ：

library(dplyr)

mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n())
mytable(iris, "Species")
# or iris %>% mytable("Species")

這使：

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

更新在撰寫本文時，dplyr 使用了%.% ，這是上面最初使用的內容，但現在%>%受到青睞，因此已將其更改為上面的內容以保持相關性。

更新 2重組現已棄用，請改用 group_by_。

根據 Roberto 的評論，更新 3 group_by_(list(...))現在在 dplyr 的新版本中變為group_by_(...) 。

更新 4添加了評論中建議的細微變化。

更新 5：使用 rlang/tidyeval 現在可以做到這一點：

library(rlang)
mytable <- function(x, ...) {
  group_ <- syms(...)
  x %>% 
    group_by(!!!group_) %>% 
    summarise(n = n())
}
mytable(iris, "Species")

或傳遞未評估的Species ，即周圍沒有引號：

library(rlang)
mytable <- function(x, ...) {
  group_ <- enquos(...)
  x %>% 
    group_by(!!!group_) %>% 
    summarise(n = n())
}
mytable(iris, Species)

更新 6：如果只有一個分組變量，現在有一個 {{...}} 表示法有效：

mytable <- function(x, group) {
  x %>% 
    group_by({{group}}) %>% 
    summarise(n = n())
}
mytable(iris, Species)

Answer 2

更新：從 dplyr 0.7.0 開始，您可以使用 tidy eval 來完成此操作。

有關更多詳細信息，請參閱http://dplyr.tidyverse.org/articles/programming.html 。

library(tidyverse)
data("iris")

my_table <- function(df, group_var) {
  group_var <- enquo(group_var)      # Create quosure
  df %>% 
    group_by(!!group_var) %>%        # Use !! to unquote the quosure
    summarise(n = n())
}

my_table(iris, Species)

> my_table(iris, Species)
# A tibble: 3 x 2
     Species     n
      <fctr> <int>
1     setosa    50
2 versicolor    50
3  virginica    50

Answer 3

他們來時很丑，但她工作：

mytable3 <- function(x, key) {
  my.call <- bquote(summarise(group_by(.(substitute(x)), NULL), n = n()))
  my.call[[2]][[3]] <- as.name(key)
  eval(my.call, parent.frame())
} 
mytable3(iris, "Species")
# Source: local data frame [3 x 2]
#
#      Species  n
# 1  virginica 50
# 2 versicolor 50
# 3     setosa 50

幾乎可以肯定，在某些情況下會導致此中斷，但您明白了。 我不認為你可以繞過電話。 另一件確實有效但更丑陋的事情是：

mytable4 <- function(x, key) summarise(group_by(x, x[[key]]), n = n())

Answer 4

作為@G回答中更新 6的補充。 Grothendieck，如果您想在匯總函數中使用字符串作為參數，而不是用雙括號 ( {{ ) 包含參數，您應該使用.data代詞，如編程小插圖：循環多個變量中所述：

mytable <- function( x, group ) {
  x %>% 
    group_by( .data[[group]] ) %>% 
    summarise( n = n() )
}

group_string <- 'Species'

mytable( iris, group_string )

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
  Species        n
  <fct>      <int>
1 setosa        50
2 versicolor    50
3 virginica     50

dplyr：如何在函數內部使用 group_by？

問題描述

4 個解決方案

解決方案1
72 已采納 2014-02-16 21:45:01

解決方案2
11 2017-06-23 20:26:26

解決方案3
2 2014-02-16 20:27:57

解決方案4
2 2021-03-17 07:33:29

dplyr：如何在函數內部使用 group_by？

問題描述

4 個解決方案

解決方案1 72 已采納 2014-02-16 21:45:01

解決方案2 11 2017-06-23 20:26:26

解決方案3 2 2014-02-16 20:27:57

解決方案4 2 2021-03-17 07:33:29

解決方案1
72 已采納 2014-02-16 21:45:01

解決方案2
11 2017-06-23 20:26:26

解決方案3
2 2014-02-16 20:27:57

解決方案4
2 2021-03-17 07:33:29