简体   繁体   English

使用 sym() 和 deparse(substitute()) 的 Function 未按预期工作

[英]Function using sym() and deparse(substitute()) not working as expected

I'm trying to build a function that takes two sorts of inputs, either numeric or character, changes them or leaves them as they are given class, then filters a dataframe by those arguments. I'm trying to build a function that takes two sorts of inputs, either numeric or character, changes them or leaves them as they are given class, then filters a dataframe by those arguments.

library(tidyverse)

fun1 = function(df,filt_col,filt_term_1,filt_term_2){
  
# changing the filt_col to symbol which is need to correctly parse things 
  filt_col = sym(filt_col)  
  
# if statement that checks whether the filtering term is numeric or not
# if it is numeric it leaves as is, whilst if not it deparse(substitutes) (i.e. makes into quoted text)
  if (!is.numeric(filt_term_1)) {filt_term_1 = deparse(substitute(filt_term_1))}
  if (!is.numeric(filt_term_2)) {filt_term_2 = deparse(substitute(filt_term_2))}
  
# doing one of two things depending on filtering terms that have been provided as arguments  
# if numeric, then filter < and > than numbers provided
# if character, then filter == to argument provided
  if(is.numeric(filt_term_1) & is.numeric(filt_term_2)) {
  
    group1 = df %>% filter(!!filt_col < filt_term_1)
    
    group2 = df %>% filter(!!filt_col > filt_term_2)
    
    
  } else {
    
    group1 = df %>% filter(!!filt_col == filt_term_1)
    
    group2 = df %>% filter(!!filt_col == filt_term_2)
    
  }

# put two groups in a list
  grouped_list = list(group1,group2)
  
  return(grouped_list)
  
}



# trying function which runs well with numeric args
fun1(iris,"Sepal.Length",4.9,4.9)

# but does not run with character args
fun1(iris,"Species",versicolor,virginica)

Firstly, I'm not sure what the error is about.首先,我不确定错误是什么。 Secondly, how can I make this more efficient?其次,我怎样才能提高效率? Ideally I would want to enter all arguments as non-quoted text.理想情况下,我想将所有 arguments 作为非引用文本输入。

Thank you.谢谢你。

Normally character values are not passed using NSE, only column names.通常不使用 NSE 传递字符值,仅传递列名。

Pass versicolor and virginica as "versicolor" and "virginica" and use S3 to handle the difference between numeric and character/factors.将 versicolor 和 virginica 传递为“versicolor”和“virginica”,并使用 S3 处理数字和字符/因子之间的差异。 Note how much simpler it is now.请注意它现在变得多么简单。 (If for some reason you don't like S3 you could use an if statement but S3 will give more modular code.) (如果由于某种原因您不喜欢 S3,您可以使用 if 语句,但 S3 会提供更多模块化代码。)

fun2 <- function(df, filt_col, filt_term_1, filt_term_2, ...) {
  UseMethod("fun2", df[[filt_col]])
}

fun2.default <- function(df, filt_col, filt_term_1, filt_term_2, ...) {
  group1 <- df %>% filter(.data[[filt_col]] < filt_term_1)
  group2 <- df %>% filter(.data[[filt_col]] > filt_term_2)
  list(group1, group2)
}

fun2.factor <- 
fun2.character <- function(df, filt_col, filt_term_1, filt_term_2, ...) {
  group1 <- df %>% filter(.data[[filt_col]] == filt_term_1)
  group2 <- df %>% filter(.data[[filt_col]] == filt_term_2)
  list(group1, group2)
}

fun2(iris,"Sepal.Length", 4.9, 4.9)

fun2(iris, "Species", "versicolor", "virginica")

Update更新

As pointed out in the comments I had missed that you want to use equality comparison for character and factor and inequality for numeric.正如我错过的评论中指出的那样,您想对字符和因子使用相等比较,对数字使用不等。 Have fixed.修好了。

The problem is the following three lines of conditions when parsing unquoted expressions to filt_term_1 and filt_term_2 :将不带引号的表达式解析为filt_term_1filt_term_2,问题是以下三行条件:

  • if (.is.numeric(filt_term_1))
  • if (.is.numeric(filt_term_2))
  • if(is.numeric(filt_term_1) & is.numeric(filt_term_2))

If filt_term_* is a numeric or character these expressions can be evaluated as they will be represented as atomic vectors.如果filt_term_*是数字或字符,则可以评估这些表达式,因为它们将表示为原子向量。 In the case of an object being passed, like the unquoted versicolor it'll fail: This object does not exist and cannot evaluated outside a context.在传递 object 的情况下,就像未引用的versicolor一样,它将失败:此 object 不存在,无法在上下文之外进行评估。

A possible fix of your code:您的代码的可能修复:

We could think of various work arounds, but to avoid an XY problem, in your case, I'd propose to let the type of the variable in the dataset determine how the inputs should be treated.我们可以考虑各种解决方法,但为了避免 XY 问题,在您的情况下,我建议让数据集中变量的类型决定应如何处理输入。 Not the type of input.不是输入的类型。

library(tidyverse)

fun1 = function(df, filt_col, filt_term_1, filt_term_2){
  
  # changing the filt_col to symbol which is need to correctly parse things 
  filt_col = sym(filt_col)  
  
  # if statement that checks whether the filtering term is numeric or not
  # if it is numeric it leaves as is, whilst if not it deparse(substitutes) (i.e. makes into quoted text)
  if (!is.numeric(pull(df, {{filt_col}}))) {filt_term_1 = deparse(substitute(filt_term_1))}
  if (!is.numeric(pull(df, {{filt_col}}))) {filt_term_2 = deparse(substitute(filt_term_2))}
  
  # doing one of two things depending on filtering terms that have been provided as arguments  
  # if numeric, then filter < and > than numbers provided
  # if character, then filter == to argument provided
  if(is.numeric(pull(df, {{filt_col}}))) {
    
    group1 = df %>% filter(!!filt_col < filt_term_1)
    
    group2 = df %>% filter(!!filt_col > filt_term_2)
    
    
  } else {
    
    group1 = df %>% filter(!!filt_col == filt_term_1)
    
    group2 = df %>% filter(!!filt_col == filt_term_2)
    
  }
  
  # put two groups in a list
  grouped_list = list(group1,group2)
  
  return(grouped_list)
  
}

A simpler solution in your spirit:一个更简单的解决方案:

You might want to explore the {{ }} syntax that I used above and simplify your code even more.您可能想探索我在上面使用的{{ }}语法并进一步简化您的代码。 The chunk below will take inputs like: fun1(iris, "Species", versicolor, virginica) and fun1(iris, Species,versicolor,virginica) .下面的块将采用如下输入: fun1(iris, "Species", versicolor, virginica)fun1(iris, Species,versicolor,virginica) However, you'd want to think carefully of what inputs to accept and why.但是,您需要仔细考虑接受哪些输入以及为什么。

library(tidyverse)

fun1 = function(df, filt_col, filt_term_1, filt_term_2){
  
  if(is.numeric(pull(df, {{filt_col}}))) {
    
    group1 = df %>% filter({{filt_col}} < filt_term_1)
    group2 = df %>% filter({{filt_col}} > filt_term_2)
    
  } else {
    
    filt_term_1 <- deparse(substitute(filt_term_1))
    filt_term_2 <- deparse(substitute(filt_term_2))
    
    # We need the if_any (or similar hack) to accept both quoted and unquoted column names.
    group1 = df %>% filter(if_any({{filt_col}}, ~ . == filt_term_1))
    group2 = df %>% filter(if_any({{filt_col}}, ~ . == filt_term_2))
    
  }
  
  # put two groups in a list
  grouped_list = list(group1,group2)
  
  return(grouped_list)
  
}

A tidyverse-spirit solution: tidyverse-spirit 解决方案:

However, as pointed out by @Limey, it would probably be more in line with the spirit of tidyverse to take input columns as objects and values as character/numeric constants: (*)但是,正如@Limey 所指出的,将输入列作为对象,将值作为字符/数字常量可能更符合tidyverse的精神:(*)

fun1(iris, Species, "versicolor", "virginica")

fun1 <- function(df, filt_col, filt_term_1, filt_term_2) {
  
  if (is.numeric(pull(df, {{filt_col}}))) {
    
    group1 <- filter(df, {{filt_col}} < filt_term_1)
    group2 <- filter(df, {{filt_col}} > filt_term_2)
    
  } else {
    
    group1 <- filter(df, {{filt_col}} == filt_term_1)
    group2 <- filter(df, {{filt_col}} == filt_term_2)
    
  }
  
  list(group1, group2)
  
}

(*) Also pointed out by G. Grothendieck normally character values are not passed using NSE, only column names. (*) G. Grothendieck 也指出,通常字符值不使用 NSE 传递,只有列名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM