在 dplyr 過濾器中使用函數

Question

我想定義一個輔助函數來幫助我更清楚地組合一些布爾過濾器。

這是使用iris數據集的結果的工作示例

library(tidyverse)


sepal_config = function(length, width, species, .data) {
  .data$Sepal.Length > length & .data$Sepal.Width < width & .data$Species == species
}

iris %>% 
  filter(
      sepal_config(length = 4, width = 3, species = "versicolor", .data = .data) |  # 34 rows
      sepal_config(length = 3, width = 3, species = "virginica",  .data = .data)    # 21 rows
    )                                                                               # 55 rows

我想這樣做而不必傳入.data ，理想情況下還可以在數據幀范圍內評估列名（即，避免此錯誤）

sepal_config = function(length, width, species) {
  Sepal.Length > length & Sepal.Width < width & Species == species
}

iris %>% 
  filter(
      sepal_config(length = 4, width = 3, species = "versicolor") |
      sepal_config(length = 3, width = 3, species = "virginica")
    )

Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `|...`.
x object 'Sepal.Length' not found

不幸的是，我不太了解 NSE，無法知道這是否是一種選擇。 我已經嘗試了使用 dplyr how-to guide 進行編程的各種技術，但是腳注讓我覺得我找錯了地方。

dplyr 的filter()受到基礎 R 的subset()啟發。 subset()提供數據屏蔽，但不提供整潔的評估，因此本章中描述的技術不適用於它。

謝謝，阿基爾

Answer 1

您可以使用quo()將表達式包裝在您的函數中並使用!! 運算符在filter()調用中解除它。

library(dplyr)

sepal_config = function(length, width, species) {
  quo(Sepal.Length > length & Sepal.Width < width & Species == species)
  }

iris %>% 
  filter(!!sepal_config(length = 4, width = 3, species = "versicolor") |
         !!sepal_config(length = 3, width = 3, species = "virginica"))


   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           5.5         2.3          4.0         1.3 versicolor
2           6.5         2.8          4.6         1.5 versicolor
3           5.7         2.8          4.5         1.3 versicolor
4           4.9         2.4          3.3         1.0 versicolor
5           6.6         2.9          4.6         1.3 versicolor
6           5.2         2.7          3.9         1.4 versicolor
7           5.0         2.0          3.5         1.0 versicolor
8           6.0         2.2          4.0         1.0 versicolor
9           6.1         2.9          4.7         1.4 versicolor
10          5.6         2.9          3.6         1.3 versicolor
...

Answer 2

dplyr為這種事情提供了一個函數cur_data() ：

library(dplyr, warn.conflicts = FALSE)

sepal_config <- function(data, length, width, species, .data = cur_data()) {
  .data$Sepal.Length > length & .data$Sepal.Width < width & .data$Species == species
}

iris %>% 
  as_tibble() %>% 
  filter(
    sepal_config(length = 4, width = 3, species = "versicolor") |  # 34 rows
      sepal_config(length = 3, width = 3, species = "virginica")    # 21 rows
  )     
#> # A tibble: 55 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>     
#>  1          5.5         2.3          4           1.3 versicolor
#>  2          6.5         2.8          4.6         1.5 versicolor
#>  3          5.7         2.8          4.5         1.3 versicolor
#>  4          4.9         2.4          3.3         1   versicolor
#>  5          6.6         2.9          4.6         1.3 versicolor
#>  6          5.2         2.7          3.9         1.4 versicolor
#>  7          5           2            3.5         1   versicolor
#>  8          6           2.2          4           1   versicolor
#>  9          6.1         2.9          4.7         1.4 versicolor
#> 10          5.6         2.9          3.6         1.3 versicolor
#> # ... with 45 more rows

^{由reprex 包( v2.0.0 ) 於 2021 年 10 月 12 日創建}

在 dplyr 過濾器中使用函數

問題描述

2 個解決方案

解決方案1
4 已采納 2021-10-12 10:42:49

解決方案2
4 2021-10-12 11:22:16

在 dplyr 過濾器中使用函數

問題描述

2 個解決方案

解決方案1 4 已采納 2021-10-12 10:42:49

解決方案2 4 2021-10-12 11:22:16

解決方案1
4 已采納 2021-10-12 10:42:49

解決方案2
4 2021-10-12 11:22:16