A function for filtering, grouping and mutating data with dplyr functions. Basic pipe sequence works great outside a function, that is where I use the true column names. Put it in a function where the column name is a variable and some of the functions work but some don't most notably dplyr::filter(). For example:
var1 <- c('yes', NA, NA, 'yes', 'yes', NA, NA, NA, 'yes', NA, 'no', 'no', 'no', 'maybe', NA, 'maybe', 'maybe', 'maybe')
var2 <- c(1:18)
df <- data.frame(var1, var2)
This works fine (ie filters NA's):
df%>%filter(!is.na(var1))
...but this doesn't:
x <- "var1"
df%>%filter(!is.na(x))
...but this does:
df%>%select(x)
It's NA's that need to be filtered out specifically.
Tried get("x"), no good, and slicing:
df[!is.na(x),]
...no good, either.
Any ideas on how to pass a variable to filter inside (or outside) a function and why a variable is working with other dplyr functions?
We can use the sym
to convert to a symbol and then with UQ
evaluate it
library(rlang)
library(dplyr)
df %>%
filter(!is.na(UQ(sym(x))))
# var1 var2
#1 yes 1
#2 yes 4
#3 yes 5
#4 yes 9
#5 no 11
#6 no 12
#7 no 13
#8 maybe 14
#9 maybe 16
#10 maybe 17
#11 maybe 18
Since my reputation is not high enough to comment above... I would suggest taking a look at my answer here: https://stackoverflow.com/a/45265617/6238025
If you want to make a function with dplyr, you need to follow the instructions at this webpage: https://rpubs.com/hadley/dplyr-programming .
library(tidyverse)
var1 <- c('yes', NA, NA, 'yes',
'yes', NA, NA, NA, 'yes', NA, 'no',
'no', 'no', 'maybe', NA, 'maybe',
'maybe', 'maybe')
var2 <- c(1:18)
df <- data_frame(var1, var2)
your_function <- function(df, filter) {
# Make filter a quosure
filter = enquo(filter)
df %>%
filter(!is.na(!!filter)) -> new_df
return(new_df)
}
new_df <- your_function(df = df, filter = var1)
You could also skip the filter = enquo(filter)
inside the function and then your call would be:
your_function(df=df, filter=quo(var1))
However the first way is nicer for making function calls. You won't need to remember quo()
That should work!
There is a new package seplyr that passes standard evaluation criteria to dplyr. Give it a try. You can pass normal quoted codes through it to dplyr. It makes passing parameters and writing functions in dplyr easier.
For your case:
install.packages("seplyr")
library(seplyr)
x<-"var1"
df%>%filter_se(paste0("!is.na(", x , ")"))
This would also work, and it's a bit simpler - just refer to the variable containing the column name between square brackets and use (.) to refer to the input df:
> df %>% filter(!is.na((.)[x]))
var1 var2
1 yes 1
2 yes 4
3 yes 5
4 yes 9
5 no 11
6 no 12
7 no 13
8 maybe 14
9 maybe 16
10 maybe 17
11 maybe 18
Note that this would also work within a function:
myfun <- function(df, var) {
df %>% filter(!is.na((.)[var]))
}
x <- "var1"
myfun(df, x)
var1 var2
1 yes 1
2 yes 4
3 yes 5
4 yes 9
5 no 11
6 no 12
7 no 13
8 maybe 14
9 maybe 16
10 maybe 17
11 maybe 18
Using ::rlang::parse_quo()
you can filter using a character variable.
See two reproducible examples below:
# Create DF
df <- data.frame(
var1 = c("yes", NA, NA, "yes", "yes", NA, NA, NA, "yes", NA, "no", "no", "no", "maybe", NA, "maybe", "maybe", "maybe"),
var2 = c(1:18)
)
x <- "var1"
FILTER <- paste0("!is.na(", x, ")")
df |> dplyr::filter(!!rlang::parse_quo(FILTER, env = parent.frame()))
#> var1 var2
#> 1 yes 1
#> 2 yes 4
#> 3 yes 5
#> 4 yes 9
#> 5 no 11
#> 6 no 12
#> 7 no 13
#> 8 maybe 14
#> 9 maybe 16
#> 10 maybe 17
#> 11 maybe 18
FILTER <- "!is.na(var1)"
df |> dplyr::filter(!!rlang::parse_quo(FILTER, env = parent.frame()))
#> var1 var2
#> 1 yes 1
#> 2 yes 4
#> 3 yes 5
#> 4 yes 9
#> 5 no 11
#> 6 no 12
#> 7 no 13
#> 8 maybe 14
#> 9 maybe 16
#> 10 maybe 17
#> 11 maybe 18
Created on 2022-09-14 by the reprex package (v2.0.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.