简体   繁体   中英

Pass column names as function arguments in formula

I want to create a re-usable function for a repeating t-test such that the column names can be passed into a formula. However, I cannot find a way to make it work. So the following code is the idea:

library(dplyr)
library(rstatix)
do.function <- function(table, column, category) {
  column = sym(column)
  category = sym(category)
  
  stat.test <- table %>%
    group_by(subset) %>%
    t_test(column ~ category)
  
  return(stat.test)
}
tmp = data.frame(id=seq(1:100), value = rnorm(100), subset = rep(c("Set1", "Set2"),each=50,2),categorical_value= rep(c("A", "B"),each=25,4))
do.function(table= tmp, column = "value", category = "categorical_value")

The current error that I get is the following:

Error: Can't extract columns that don't exist.
x Column `category` doesn't exist.
Run `rlang::last_error()` to see where the error occurred. 

The question is whether somebody knows how to solve this?

Just make a formula instead of wrapping them in sym :

library(dplyr)
library(rstatix)
do.function <- function(table, column, category) {
  formula <- paste0(column, '~', category) %>% 
    as.formula()
  
  table %>%
    group_by(subset) %>%
    t_test(formula)
}
tmp = data.frame(id=seq(1:100), value = rnorm(100), subset = rep(c("Set1", "Set2"),each=50,2),categorical_value= rep(c("A", "B"),each=25,4))
do.function(table= tmp, column = "value", category = "categorical_value")
# A tibble: 2 x 9
  subset .y.   group1 group2    n1    n2 statistic    df     p
* <chr>  <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl>
1 Set1   value A      B         50    50     0.484  94.3 0.63 
2 Set2   value A      B         50    50    -2.15   97.1 0.034

As we are passing string values, we may just use reformulate to create the expression in formula

do.function <- function(table, column, category) {
  
  
  stat.test <- table %>%
    group_by(subset) %>%
    t_test(reformulate(category, response = column ))
  
  return(stat.test)
}

-testing

> do.function(table= tmp, column = "value", category = "categorical_value")
# A tibble: 2 × 9
  subset .y.   group1 group2    n1    n2 statistic    df      p
* <chr>  <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>  <dbl>
1 Set1   value A      B         50    50     1.66   97.5 0.0993
2 Set2   value A      B         50    50     0.448  92.0 0.655 

Formula actually is already used in rstatix::t_test , and we net to get the variables by their names.

do.function <- function(table, column, category) {
  stat.test <- table  %>%
    mutate(column=get(column), 
           category=get(category)) %>%
    rstatix::t_test(column ~ category)
  return(stat.test)
}

do.function(table=tmp, column="value", category="categorical_value")
# # A tibble: 1 × 8
# .y.    group1 group2    n1    n2 statistic    df     p
# * <chr>  <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl>
# 1 column A      B        100   100     0.996  197.  0.32

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM