將自定義 function 的參數傳遞給 group_by 不起作用

Question

我不知道為什么將自定義 function 的參數傳遞給group_by不起作用。 我只是從數據集中傳遞一個colName ，當我運行我自己的 function 時出現錯誤：必須按在.data 中找到的變量分組。 未找到列“colName”。 在下面的示例中，我使用quakes環境中可用的 quakes 數據集：

foo <- function(data, colName) {
  
  result <- data %>%
   group_by(colName) %>%
   summarise(count = n()) 

  return(result)
}

foo(quakes, "stations")

# I also tried passing w/o commas but it is not working too:
# foo(quakes, stations)

我注意到，當我將列名顯式傳遞給group_by時，它可以工作：

group_by(stations) %>%

但是，在 function 中硬編碼列名是沒有意義的。

Answer 1

我相信您只需將變量名稱包裝在get中。

foo <- function(data, colName) {
  
  result <- data %>%
   dplyr::group_by(get(colName)) %>%
   dplyr::summarise(count = n()) 

  return(result)
}

> foo(quakes, "stations")
# A tibble: 102 x 2
   `get(colName)` count
            <int> <int>
 1             10    20
 2             11    28
 3             12    25
 4             13    21
 5             14    39
 6             15    34
 7             16    35
 8             17    38
 9             18    33
10             19    29

Answer 2

這是使它工作的另一種方法。 您可以將.data[[var]]構造用於存儲為字符串的列名：

foo <- function(data, colName) {
  
  result <- data %>%
    group_by(.data[[colName]]) %>%
    summarise(count = n()) 
  
  return(result)
}

foo(quakes, "stations")

# A tibble: 102 x 2
   stations count
      <int> <int>
 1       10    20
 2       11    28
 3       12    25
 4       13    21
 5       14    39
 6       15    34
 7       16    35
 8       17    38
 9       18    33
10       19    29
# ... with 92 more rows

如果您決定不將ColName作為字符串傳遞，您可以在 function 內用一對花括號將其包裹起來，以獲得類似的結果：

foo <- function(data, colName) {
  
  result <- data %>%
    group_by({{ colName }}) %>%
    summarise(count = n()) 
  
  return(result)
}

foo(quakes, stations)

# A tibble: 102 x 2
   stations count
      <int> <int>
 1       10    20
 2       11    28
 3       12    25
 4       13    21
 5       14    39
 6       15    34
 7       16    35
 8       17    38
 9       18    33
10       19    29
# ... with 92 more rows

Answer 3

使用 dplyr 嘗試：

library(dplyr)

foo <- function(data, colName) {

  colName = sym(colName)
  
    result <- data %>%
    group_by(!!colName) %>%
    summarise(count = n()) 
  
  return(result)
}


foo(quakes, "stations")
#> # A tibble: 102 x 2
#>    stations count
#>       <int> <int>
#>  1       10    20
#>  2       11    28
#>  3       12    25
#>  4       13    21
#>  5       14    39
#>  6       15    34
#>  7       16    35
#>  8       17    38
#>  9       18    33
#> 10       19    29
#> # ... with 92 more rows

^{由代表 package (v2.0.0) 於 2021 年 5 月 4 日創建}

Answer 4

一個選項也是使用ensym和評估 ( !! ) 以便它可以接受帶引號和不帶引號的參數

foo <- function(data, colName) {
       data %>%
         dplyr::group_by(!! rlang::ensym(colName)) %>%
         dplyr::summarise(count = n())
  }

foo(quakes, stations)
foo(quakes, "stations")

將自定義 function 的參數傳遞給 group_by 不起作用

問題描述

4 個解決方案

解決方案1
3 2021-05-04 09:35:32

解決方案2
3 已采納 2021-05-04 09:54:07

解決方案3
2 2021-05-04 09:45:44

解決方案4
2 2021-05-04 16:33:10

將自定義 function 的參數傳遞給 group_by 不起作用

問題描述

4 個解決方案

解決方案1 3 2021-05-04 09:35:32

解決方案2 3 已采納 2021-05-04 09:54:07

解決方案3 2 2021-05-04 09:45:44

解決方案4 2 2021-05-04 16:33:10

解決方案1
3 2021-05-04 09:35:32

解決方案2
3 已采納 2021-05-04 09:54:07

解決方案3
2 2021-05-04 09:45:44

解決方案4
2 2021-05-04 16:33:10