简体   繁体   中英

Writing function around statistical tests in R

I'm writing a function for my (working) R script in order to clean up my code. I do not have experience with writing functions, but decided I should invest some time into this. The goal of my function is to perform multiple statistical tests while only passing the required dataframe, quantitative variable and grouping variable once . However, I cannot get this to work. For your reference, I'll use the ToothGrowth data frame to illustrate my problem.

Say I want to run a Kruskal-Wallis test and one-way ANOVA on len , to compare different groups named supp , for whatever reason. I can do this separately with

kruskal.test(len ~ supp, data = ToothGrowth)
aov(len ~ supp, data = ToothGrowth)

Now I want to write a function that performs both tests. This is what I had thought should work:

stat_test <- function(mydata, quantvar, groupvar) {
  kruskal.test(quantvar ~ groupvar, data = mydata)
  aov(quantvar ~ groupvar, data = mydata)
}

But if I then run stat_test(ToothGrowth, "len", "sup") , I get the error

Error in kruskal.test.default("len", "supp") : 
  all observations are in the same group 

What am I doing wrong? Any help would be much appreciated!

It looks like you need to convert your variable arguments, given as text strings, into a formula. You can do this by concatenating the strings with paste() . Also, you will need to wrap print() around both of your statistical tests within the function, otherwise only the last one will display.

Here is the modified function:

stat_test <- function(mydata, quantvar, groupvar) {
  model_formula <- formula(paste(quantvar, '~', groupvar))
  print(kruskal.test(model_formula, data = mydata))
  print(aov(model_formula, data = mydata))
}

You can use deparse(substitute(quantvar)) to get the quoted name of the column you are passing to the function, and this will allow you to build a formula using paste . This is a more idiomatic way of operating in R.

Here's a reproducible example:

stat_test <- function(mydata, quantvar, groupvar) {
  A <- as.formula(paste(deparse(substitute(quantvar)), "~", 
                        deparse(substitute(groupvar))))
  print(kruskal.test(A, data = mydata))
  cat("\n--------------------------------------\n\n")
  aov(A, data = mydata)
}

stat_test(ToothGrowth, len, supp)
#> 
#>  Kruskal-Wallis rank sum test
#> 
#> data:  len by supp
#> Kruskal-Wallis chi-squared = 3.4454, df = 1, p-value = 0.06343
#> 
#> 
#> --------------------------------------
#> Call:
#>    aov(formula = A, data = mydata)
#> 
#> Terms:
#>                     supp Residuals
#> Sum of Squares   205.350  3246.859
#> Deg. of Freedom        1        58
#> 
#> Residual standard error: 7.482001
#> Estimated effects may be unbalanced

Created on 2020-03-30 by the reprex package (v0.3.0)

For reference, if using rstatix (tidy version of statistical functions), you need to use sym and !! , while using formula() when needed.

make_kruskal_test <- function(data, quantvar, groupvar) {
  library(rstatix, quietly = TRUE)
  library(rlang, quietly = TRUE)

  formula_expression <- formula(paste(quantvar, "~", groupvar))
  quantvar_sym <- sym(quantvar)

  shapiro <- shapiro_test(data, !!quantvar_sym) %>% print()
}

sample_data <- tibble::tibble(sample = letters[1:5], mean = 1:5)

make_kruskal_test(sample_data, "mean", "sample")

#> # A tibble: 1 x 3
#>   variable statistic     p
#>   <chr>        <dbl> <dbl>
#> 1 mean         0.987 0.967

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM