简体   繁体   中英

How to pass variable to filter function within a R function

I am fairly new to R. I wrote the below function which tries to summarise a dataframe, based on a feature variable (passed to the function as 'variable') and a target variable (passed to the function as target_var ). I also pass it a value ( target_val ) on which to filter.

The function below falls over on the filter line ( filter(target_var == target_val) ). I think it has something to do with quo , quosure etc, but can't figure out how to fix it. The following code should be ready to run - if you exclude the filter line it should work, if you included the filter line it will fall over.

library(dplyr)
target <- c('good', 'good', 'bad', 'good', 'good', 'bad')
var_1 <- c('debit_order', 'other', 'other', 'debit_order','debit_order','debit_order')

dset <- data.frame(target, var_1)
odds_by_var <- function(dataframe, variable, target_var, target_val){

  df_name <- paste('odds', deparse(substitute(variable)), sep = "_")
  variable_string <- deparse(substitute(variable))
  target_string <- deparse(substitute(target_var))

  temp_df1 <- dataframe %>%
    group_by_(variable_string, target_string) %>%
    summarise(cnt = n()) %>%
    group_by_(variable_string) %>%
    mutate(total = sum(cnt)) %>%
    mutate(rate = cnt / total) %>%
    filter(target_var == target_val) 

  assign(df_name, temp_df1, envir=.GlobalEnv)

}

odds_by_var(dset, var_1, target, 'bad')

so I assume you want to filter by target good or bad. In my understanding, always filter() before you group_by() , as you will possibly ommit your filter variables. I restructured your function a little:

    dset <- data.frame(target, var_1)
odds_by_var <- function(dataframe, variable, target_var, target_val){

  df_name <- paste('odds', deparse(substitute(variable)), sep = "_")
  variable_string <- deparse(substitute(variable))
  target_string <- deparse(substitute(target_var))

  temp_df1 <- dataframe %>%
    group_by_(variable_string, target_string) %>%
    summarise(cnt = n()) %>%
    mutate(total = sum(cnt),
           rate = cnt / total) 
names(temp_df1) <- c(variable_string,"target","cnt","total","rate" )
temp_df1 <- temp_df1[temp_df1$target == target_val,]
  assign( df_name,temp_df1, envir=.GlobalEnv)

}

odds_by_var(dset, var_1, target, "bad")

result:

> odds_var_1
# A tibble: 2 x 5
# Groups:   var_1 [2]
  var_1       target   cnt total  rate
  <chr>       <chr>  <int> <int> <dbl>
1 debit_order bad        1     4  0.25
2 other       bad        1     2  0.5 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM