简体   繁体   English

如何将列名传递给涉及变异的 function?

[英]How do I pass a column name to a function involving mutate?

I am trying to write a function that takes a string and uses it as a column name in dplyr::mutate() on both sides of the equals sign.我正在尝试编写一个 function ,它接受一个字符串并将其用作等号两侧的dplyr::mutate()中的列名。 Here is an example of what I'd like to automate:这是我想要自动化的示例:

cars %>% 
  mutate(
    new_speed = speed + 5,
    revised_speed = case_when(new.speed < 12 ~ 0,
                              new.speed == 12 ~ 1,
                              new.speed > 12 ~ 1/new_speed), 
  )

In order to automate this process for any dataset, I need to 1) attach the prefix "new" to whichever column name I specify, and 2) create an additional column with "improved" prefixed that depends on the values of the first column.为了对任何数据集自动执行此过程,我需要 1) 将前缀“new”附加到我指定的任何列名,以及 2) 创建一个附加列,其前缀为“improved”,这取决于第一列的值。

The function should look something like this, where ** ** is replaced with the proper syntax: function 应该如下所示,其中 ** ** 替换为正确的语法:

insert_names <- function(df, oldname, prefix_1, prefix_2){
  df %>% mutate(
    **prefix_1.oldname** = oldname + 5,
    **prefix_2.oldname** = case_when(**prefix_1.oldname** < 12 ~ 0,
                                     **prefix_1.oldname** == 12 ~ 1,
                                     **prefix_1.oldname** > 12 ~ 1/**prefix_1.oldname**),
    
  )
}

The correct function should reproduce the original output like this:正确的 function 应该像这样重现原始 output:

insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")

though I could leave speed unquoted if that's easier.虽然如果这更容易,我可以不引用speed

  • We can use我们可以用
library(dplyr)
library(data.table)

insert_names <- function(df, oldname, prefix_1, prefix_2){
    pre1_old <- paste0(prefix_1 , "." , oldname)
    pre2_old <- paste0(prefix_2 , "." , oldname)
    d <- df %>% mutate(
        x = !!sym(oldname) + 5,
        y = case_when(x < 12 ~ 0,
                      x == 12 ~ 1,
                      x > 12 ~ 1/x),
        
    )
    d  %>% setnames(c("x" , "y") ,c(pre1_old ,pre2_old))
    d
}

insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")
  • ouput输出
  speed dist new.speed improved.speed
1      4    2         9     0.00000000
2      4   10         9     0.00000000
3      7    4        12     1.00000000
4      7   22        12     1.00000000
5      8   16        13     0.07692308
6      9   10        14     0.07142857
7     10   18        15     0.06666667
8     10   26        15     0.06666667
9     10   34        15     0.06666667
10    11   17        16     0.06250000
11    11   28        16     0.06250000
12    12   14        17     0.05882353
13    12   20        17     0.05882353
14    12   24        17     0.05882353
15    12   28        17     0.05882353

A nice case for using rlang :使用rlang的一个很好的例子:

library(dplyr)
library(rlang)

insert_names <- function(df, oldname, prefix_1, prefix_2){
  
  new_name_1 <- paste(prefix_1, oldname, sep = ".")
  new_name_2 <- paste(prefix_2, oldname, sep = ".")
  
  df %>% mutate(
    !!new_name_1 := !!sym(oldname) + 5,
    !!new_name_2 := case_when(!!sym(new_name_1) < 12 ~ 0,
                                     !!sym(new_name_1) == 12 ~ 1,
                                     !!sym(new_name_1) > 12 ~ 1/!!sym(new_name_1)),
  )
}

insert_names(cars, "speed", "new", "newer")
#>    speed dist new.speed newer.speed
#> 1      4    2         9  0.00000000
#> 2      4   10         9  0.00000000
#> 3      7    4        12  1.00000000
#> 4      7   22        12  1.00000000
#> 5      8   16        13  0.07692308
#> 6      9   10        14  0.07142857
#> 7     10   18        15  0.06666667
#> 8     10   26        15  0.06666667
#> 9     10   34        15  0.06666667
#> 10    11   17        16  0.06250000
...

Edit编辑

I do see that the other answer posted about the same time used the same method.我确实看到大约在同一时间发布的另一个答案使用了相同的方法。 Minor difference is in where new columns are named, either when created or before returning data frame.细微的区别在于新列的命名位置,无论是在创建时还是在返回数据框之前。

For this, you need to use forcing and defusing operators.为此,您需要使用强制和化解运算符。 The double curly braces force and defuse a given string, which allows you to (1) reference a column name as a string and (2) force the function argument.双花括号强制和解散给定的字符串,它允许您 (1) 将列名作为字符串引用和 (2) 强制 function 参数。 When using these operators you must use ":=" as the assignment operator.使用这些运算符时,您必须使用“:=”作为赋值运算符。 I also use get() to get the column value from the referenced string name.我还使用 get() 从引用的字符串名称中获取列值。 Not sure if this is THE most efficient way and I'm sure someone may have better code, but it works.不确定这是否是最有效的方法,我相信有人可能有更好的代码,但它确实有效。

(note: ,, or "bang-bang" is a forcing operator, equo() defuses, {{}} does both and is the equivalent of !!enquo() -- not sure if you need {{}} each place I put them in this code but yeah) (注意:, 或“bang-bang”是一个强制运算符,equo() 化解,{{}} 两者兼而有之,相当于 !!enquo() - 不确定每个地方是否需要 {{}}我把它们放在这段代码中但是是的)

Here is a working code:这是一个工作代码:

insert_names <- function(df, oldname, prefix_1, prefix_2){
  col_name1 = paste0(prefix_1, "_", oldname)
  col_name2 = paste0(prefix_2, "_", oldname)
  df %>% mutate(
    {{col_name1}} := get(!!oldname) + 5,
    {{col_name2}} := case_when(get(!!col_name1) < 12 ~ 0,
                               get(!!col_name1) == 12 ~ 1,
                               TRUE ~ 1/get(!!col_name1)
    
  ))
}

insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM