Why doesn't dplyr filter() work within function (i.e. using variable for column name)?

Question

A function for filtering, grouping and mutating data with dplyr functions. Basic pipe sequence works great outside a function, that is where I use the true column names. Put it in a function where the column name is a variable and some of the functions work but some don't most notably dplyr::filter(). For example:

var1 <- c('yes', NA, NA, 'yes', 'yes', NA, NA, NA, 'yes', NA, 'no', 'no', 'no', 'maybe', NA, 'maybe', 'maybe', 'maybe')

var2 <- c(1:18)

df <- data.frame(var1, var2)

This works fine (ie filters NA's):

df%>%filter(!is.na(var1))

...but this doesn't:

x <- "var1"

df%>%filter(!is.na(x))

...but this does:

df%>%select(x)

It's NA's that need to be filtered out specifically.

Tried get("x"), no good, and slicing:

df[!is.na(x),]

...no good, either.

Any ideas on how to pass a variable to filter inside (or outside) a function and why a variable is working with other dplyr functions?

Answer 1

We can use the sym to convert to a symbol and then with UQ evaluate it

library(rlang)
library(dplyr)
df %>%
   filter(!is.na(UQ(sym(x))))
#     var1 var2
#1    yes    1
#2    yes    4
#3    yes    5
#4    yes    9
#5     no   11
#6     no   12
#7     no   13
#8  maybe   14
#9  maybe   16
#10 maybe   17
#11 maybe   18

Answer 2

Since my reputation is not high enough to comment above... I would suggest taking a look at my answer here: https://stackoverflow.com/a/45265617/6238025

If you want to make a function with dplyr, you need to follow the instructions at this webpage: https://rpubs.com/hadley/dplyr-programming .

library(tidyverse)
var1 <- c('yes', NA, NA, 'yes', 
          'yes', NA, NA, NA, 'yes', NA, 'no', 
          'no', 'no', 'maybe', NA, 'maybe', 
          'maybe', 'maybe')
var2 <- c(1:18)

df <- data_frame(var1, var2)

your_function <- function(df, filter) {
      # Make filter a quosure
      filter = enquo(filter)

      df %>% 
        filter(!is.na(!!filter)) -> new_df

      return(new_df)
}
new_df <- your_function(df = df, filter = var1)

You could also skip the filter = enquo(filter) inside the function and then your call would be:

your_function(df=df, filter=quo(var1))

However the first way is nicer for making function calls. You won't need to remember quo()

That should work!

Answer 3

There is a new package seplyr that passes standard evaluation criteria to dplyr. Give it a try. You can pass normal quoted codes through it to dplyr. It makes passing parameters and writing functions in dplyr easier.

For your case:

install.packages("seplyr")
library(seplyr)
x<-"var1"
df%>%filter_se(paste0("!is.na(", x , ")"))

Answer 4

This would also work, and it's a bit simpler - just refer to the variable containing the column name between square brackets and use (.) to refer to the input df:

> df %>% filter(!is.na((.)[x]))
    var1 var2
1    yes    1
2    yes    4
3    yes    5
4    yes    9
5     no   11
6     no   12
7     no   13
8  maybe   14
9  maybe   16
10 maybe   17
11 maybe   18

Note that this would also work within a function:

myfun <- function(df, var) {
  df %>%  filter(!is.na((.)[var]))
 }

 x <- "var1"
 myfun(df, x)

    var1 var2
1    yes    1
2    yes    4
3    yes    5
4    yes    9
5     no   11
6     no   12
7     no   13
8  maybe   14
9  maybe   16
10 maybe   17
11 maybe   18

Answer 5

Using ::rlang::parse_quo() you can filter using a character variable.

See two reproducible examples below:

# Create DF
df <- data.frame(
  var1 = c("yes", NA, NA, "yes", "yes", NA, NA, NA, "yes", NA, "no", "no", "no", "maybe", NA, "maybe", "maybe", "maybe"),
  var2 = c(1:18)
)

Using x to define a variable

x <- "var1"
FILTER <- paste0("!is.na(", x, ")")
df |> dplyr::filter(!!rlang::parse_quo(FILTER, env = parent.frame()))
#>     var1 var2
#> 1    yes    1
#> 2    yes    4
#> 3    yes    5
#> 4    yes    9
#> 5     no   11
#> 6     no   12
#> 7     no   13
#> 8  maybe   14
#> 9  maybe   16
#> 10 maybe   17
#> 11 maybe   18

Using FILTER to create a full filter statement

FILTER <- "!is.na(var1)"
df |> dplyr::filter(!!rlang::parse_quo(FILTER, env = parent.frame()))
#>     var1 var2
#> 1    yes    1
#> 2    yes    4
#> 3    yes    5
#> 4    yes    9
#> 5     no   11
#> 6     no   12
#> 7     no   13
#> 8  maybe   14
#> 9  maybe   16
#> 10 maybe   17
#> 11 maybe   18

^{Created on 2022-09-14 by the reprex package (v2.0.1)}

Why doesn't dplyr filter() work within function (i.e. using variable for column name)?

Question

5 answers

solution1
3 ACCPTED 2017-07-23 04:01:21

solution2
2 2017-07-23 15:12:26

solution3
1 2017-07-24 13:50:40

solution4
0 2017-07-23 08:29:07

solution5
0 2022-09-14 10:45:46

Using x to define a variable

Using FILTER to create a full filter statement

Why doesn't dplyr filter() work within function (i.e. using variable for column name)?

Question

5 answers

solution1 3 ACCPTED 2017-07-23 04:01:21

solution2 2 2017-07-23 15:12:26

solution3 1 2017-07-24 13:50:40

solution4 0 2017-07-23 08:29:07

solution5 0 2022-09-14 10:45:46

Using x to define a variable

Using FILTER to create a full filter statement

solution1
3 ACCPTED 2017-07-23 04:01:21

solution2
2 2017-07-23 15:12:26

solution3
1 2017-07-24 13:50:40

solution4
0 2017-07-23 08:29:07

solution5
0 2022-09-14 10:45:46