简体   繁体   English

在 R 中引用 function 中的引用列名

[英]refer to quoted column name in a function in R

I want to use the na_omit function from the collapse package in a user-defined function.我想在用户定义的 function 中使用来自折叠package 的na_omit function。 na_omit requires a column name to be in quotes as one of its arguments. na_omit要求列名用引号括起来作为其 arguments 之一。 If I didn't need the column name in quotes, I could just refer to the column name in double braces, {{col}} , as mentioned in this vignette, "Programming with dplyr" .如果我不需要引号中的列名,我可以只引用双括号中的列名{{col}}如本小插图“使用 dplyr 编程”中所述 If I refer to the column using the glue package, such as glue::glue("{col}") , I receive errors.如果我引用使用胶水package 的列,例如glue::glue("{col}") ,我会收到错误。

Here is a reprex:这是一个代表:

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

library(collapse)
library(dplyr)
library(glue)

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, color_code)

The expected output can be generated with the following:可以使用以下命令生成预期的 output:

my_df %>% 
  collapse::na_omit(cols = c("color_code")) 

and should produce:并且应该产生:

#  color_code  color
#1        V9G   Blue
#2        J4C  White
#3        F7B Orange
#4        G3V  Green

How should I refer to a quoted column name that's a parameter and an argument of a function within a user-defined function in R?我应该如何在 R 中的用户定义的 function 中引用作为参数的引用列名称和 function 的参数?

In general, collapse is mostly standard evaluation and its NSE features are based upon base R, so most of the rlang and glue stuff etc. won't work, but you will have simpler and faster code.一般来说,collapse 主要是标准评估,它的 NSE 功能基于基础 R,因此大多数 rlang 和胶水等内容都不起作用,但您将拥有更简单和更快的代码。 As suggested, for a single column, a solution would be function(df, col) { col <- as.character(substitute(col)); ...; }正如建议的那样,对于单个列,解决方案是function(df, col) { col <- as.character(substitute(col)); ...; } function(df, col) { col <- as.character(substitute(col)); ...; } function(df, col) { col <- as.character(substitute(col)); ...; } ie use substitute() to capture the expression and as.character or all.vars to extract the variables.即使用substitute() function(df, col) { col <- as.character(substitute(col)); ...; }捕获表达式和as.characterall.vars来提取变量。 For multiple columns a general solution is wrapping fselect eg对于多列,一般解决方案是包装fselect例如

library(collapse)
my_func <- function(df, ...) {
  cols <- fselect(df, ..., return = "indices")
  na_omit(df, cols = cols) 
}

my_func(wlddev, PCGDP:GINI, POP) |> head()
#>   country iso3c       date year decade                region
#> 1 Albania   ALB 1997-01-01 1996   1990 Europe & Central Asia
#> 2 Albania   ALB 2003-01-01 2002   2000 Europe & Central Asia
#> 3 Albania   ALB 2006-01-01 2005   2000 Europe & Central Asia
#> 4 Albania   ALB 2009-01-01 2008   2000 Europe & Central Asia
#> 5 Albania   ALB 2013-01-01 2012   2010 Europe & Central Asia
#> 6 Albania   ALB 2015-01-01 2014   2010 Europe & Central Asia
#>                income  OECD    PCGDP LIFEEX GINI       ODA     POP
#> 1 Upper middle income FALSE 1869.866 72.495 27.0 294089996 3168033
#> 2 Upper middle income FALSE 2572.721 74.579 31.7 453309998 3051010
#> 3 Upper middle income FALSE 3062.674 75.228 30.6 354950012 3011487
#> 4 Upper middle income FALSE 3775.581 75.912 30.0 338510010 2947314
#> 5 Upper middle income FALSE 4276.608 77.252 29.0 335769989 2900401
#> 6 Upper middle income FALSE 4413.297 77.813 34.6 260779999 2889104

Created on 2022-02-03 by the reprex package (v2.0.1)代表 package (v2.0.1) 于 2022 年 2 月 3 日创建

You have to provide col name as a character, like:您必须提供 col 名称作为字符,例如:

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, col = "color_code")

It's important to first determine what environment in R you're programming in. Are you in dplyr or base R?首先确定您正在编程的 R 中的环境很重要。您是在dplyr还是基础 R 中? If in dplyr , then reference the documentation for programming with dplyr , rlang , glue , and this stackoverflow answer .如果在dplyr中,请参考使用dplyrrlangglue这个 stackoverflow 答案进行编程的文档。 If in base R, reference the documentation on non-standard evaluation .如果在基础 R 中,请参考非标准评估文档 Reasons for this question comes from environment confusion.这个问题的原因来自环境混乱。 Here are some of the different approaches in a reprex.以下是reprex 中的一些不同方法。

Data数据

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

Packages套餐

library(collapse)
library(dplyr)
library(stringr)
library(glue)

Functional Programming in base R基础 R 中的函数式编程
with a quoted column name:带引号的列名:

my_func <- function(df, col) {
  col_char_ref <- deparse(substitute(col)) #Use deparse(substitute()) to refer to a quoted column name
  df %>% 
    collapse::na_omit(cols = col_char_ref) 
}

my_func(my_df, color_code)

#Should generate output below
my_df %>% 
  collapse::na_omit(cols = "color_code")

and with a non-quoted column name:并使用未引用的列名:

my_func <- function(df, col){
  env <- list2env(df, parent = parent.frame())
  col <- substitute(col)
  df %>%  
    collapse::ftransform(count = stringr::str_length(eval(col, env)))
}

my_func(my_df, color)

#Should generate output below
my_df %>%  
  collapse::ftransform(count = stringr::str_length(color))

Functional programming in dplyr dplyr 中的函数式编程
with a quoted column name using glue and dplyr functions:使用胶水dplyr函数引用列名:

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := glue("color code: {pull(., {{col1}})}; color: {pull(., {{col2}})}"))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

or with a quoted column name using a C language wrapper function:或使用 C 语言包装器 function 使用带引号的列名:

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := sprintf("color code: %s; color: %s", {{col1}}, {{col2}}))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

and with a non-quoted column name:并使用未引用的列名:

my_func <- function(df, col){
  df %>%  
    dplyr::mutate(count = stringr::str_length({{ col }}))
}

my_func(my_df, color)

#Should generate output below
my_df %>% 
  dplyr::mutate(count = stringr::str_length(color))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM