简体   繁体   English

将参数传递给 dplyr 中 group_by 的函数

[英]Passing argument into function for group_by in dplyr

I am trying to use group_by within a function call in dplyr (R) and I am getting unexpected results.我正在尝试在 dplyr (R) 的函数调用中使用 group_by,但我得到了意想不到的结果。 Here is an example of what I am trying to do:这是我正在尝试做的一个例子:

df = data.frame(a = c(0,0,1,1), b = c(0,1,0,1), c = c(1,2,3,4))

result1 = df %>%
  group_by(a,b) %>%
  mutate(d = sum(c))
result1$d

myFunc <- function(df, var) {
  output = df %>%
    group_by(a,!!var) %>%
    mutate(d = sum(c))
  return(output)
}

result2 = myFunc(df,"b")
result2$d

result1$d yields [1,2,3,4] which is what I expected. result1$d 产生 [1,2,3,4] 这是我所期望的。 result2$d yields [3,3,7,7] which I do not want, and I am not sure what is going on. result2$d 产生我不想要的 [3,3,7,7] ,我不确定发生了什么。

It works to have b (without quotes) as the function argument, and {{var}} in place of !!var.它可以将 b(不带引号)作为函数参数,并用 {{var}} 代替 !!var。 Unfortunately, in my case, my column names are in string format (but maybe there is a way to transform the string beforehand so that it will work with the {{}} notation?)不幸的是,在我的情况下,我的列名是字符串格式(但也许有一种方法可以预先转换字符串,以便它可以使用 {{}} 表示法?)

If you want to pass a character object that can refer to a certain column of a data frame, you should use !!sym(var) :如果你想传递一个可以引用数据框某一列的字符对象,你应该使用!!sym(var)

myFunc <- function(df, var) {
  output = df %>%
    group_by(a, !!sym(var)) %>%
    mutate(d = sum(c))
  return(output)
}

myFunc(df, "b")

If you want to pass a data-masked argument, you should use {{ var }} or equivalently !!enquo(var) :如果你想传递一个数据屏蔽参数,你应该使用{{ var }}或等效!!enquo(var)

myFunc <- function(df, var) {
  output = df %>%
    group_by(a, {{ var }}) %>%
    mutate(d = sum(c))
  return(output)
}

myFunc(df, b)

Note that I pass "b" and b respectively into the function in the two different cases.请注意,在两种不同的情况下,我分别将"b"b传递给函数。

If we want to use quoting and unquoting instead of curlycurly {{}} the we should consider this basic procedure: https://tidyeval.tidyverse.org/dplyr.html如果我们想使用引用和取消引用而不是 curlycurly {{}} ,我们应该考虑这个基本过程: https ://tidyeval.tidyverse.org/dplyr.html

Creating a function around dplyr pipelines involves three steps: abstraction, quoting, and unquoting.围绕 dplyr 管道创建函数涉及三个步骤:抽象、引用和取消引用。

1. Abstraction step: 1. 抽象步骤:

  • Here we identify the varying steps.在这里,我们确定了不同的步骤。 In our case var in group_by :在我们的例子中vargroup_by

2. Quoting step: 2.报价步骤:

  • Identify all the arguments where the user is allowed to refer to data frame columns directly.识别允许用户直接引用数据框列的所有参数。
  • The function can't evaluate these arguments right away.该函数无法立即评估这些参数。
  • Instead they should be automatically quoted.相反,它们应该被自动引用。 Apply enquo() to these argumentsenquo()应用于这些参数

3. Unquoting step: 3. 取消报价步骤:

  • Identify where these variables are passed to other quoting functions and unquote with !!确定这些变量在何处传递给其他引用函数并使用!!取消引用. .
  • In this case we pass var to group_by() :在这种情况下,我们将var传递给group_by()
myFunc <- function(df, var) {
  var <- enquo(var)
  output = df %>%
    group_by(a,!!var) %>%
    mutate(d = sum(c))
  return(output)
}

result2 = myFunc(df,b)

output:输出:

[1] 1 2 3 4

Just as I post a question, I come across something that works...就像我发布一个问题一样,我遇到了一些有用的东西......

myFunc <- function(df, var) {
  output = df %>%
    group_by_at(.vars = c("a",var)) %>%
    mutate(d = sum(c))
  return(output)
}

result2 = myFunc(df,"b")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM