I'm writing a function for my (working) R script in order to clean up my code. I do not have experience with writing functions, but decided I should invest some time into this. The goal of my function is to perform multiple statistical tests while only passing the required dataframe, quantitative variable and grouping variable once . However, I cannot get this to work. For your reference, I'll use the ToothGrowth data frame to illustrate my problem.
Say I want to run a Kruskal-Wallis test and one-way ANOVA on len
, to compare different groups named supp
, for whatever reason. I can do this separately with
kruskal.test(len ~ supp, data = ToothGrowth)
aov(len ~ supp, data = ToothGrowth)
Now I want to write a function that performs both tests. This is what I had thought should work:
stat_test <- function(mydata, quantvar, groupvar) {
kruskal.test(quantvar ~ groupvar, data = mydata)
aov(quantvar ~ groupvar, data = mydata)
}
But if I then run stat_test(ToothGrowth, "len", "sup")
, I get the error
Error in kruskal.test.default("len", "supp") :
all observations are in the same group
What am I doing wrong? Any help would be much appreciated!
It looks like you need to convert your variable arguments, given as text strings, into a formula. You can do this by concatenating the strings with paste()
. Also, you will need to wrap print()
around both of your statistical tests within the function, otherwise only the last one will display.
Here is the modified function:
stat_test <- function(mydata, quantvar, groupvar) {
model_formula <- formula(paste(quantvar, '~', groupvar))
print(kruskal.test(model_formula, data = mydata))
print(aov(model_formula, data = mydata))
}
You can use deparse(substitute(quantvar))
to get the quoted name of the column you are passing to the function, and this will allow you to build a formula using paste
. This is a more idiomatic way of operating in R.
Here's a reproducible example:
stat_test <- function(mydata, quantvar, groupvar) {
A <- as.formula(paste(deparse(substitute(quantvar)), "~",
deparse(substitute(groupvar))))
print(kruskal.test(A, data = mydata))
cat("\n--------------------------------------\n\n")
aov(A, data = mydata)
}
stat_test(ToothGrowth, len, supp)
#>
#> Kruskal-Wallis rank sum test
#>
#> data: len by supp
#> Kruskal-Wallis chi-squared = 3.4454, df = 1, p-value = 0.06343
#>
#>
#> --------------------------------------
#> Call:
#> aov(formula = A, data = mydata)
#>
#> Terms:
#> supp Residuals
#> Sum of Squares 205.350 3246.859
#> Deg. of Freedom 1 58
#>
#> Residual standard error: 7.482001
#> Estimated effects may be unbalanced
Created on 2020-03-30 by the reprex package (v0.3.0)
For reference, if using rstatix (tidy version of statistical functions), you need to use sym
and !!
, while using formula()
when needed.
make_kruskal_test <- function(data, quantvar, groupvar) {
library(rstatix, quietly = TRUE)
library(rlang, quietly = TRUE)
formula_expression <- formula(paste(quantvar, "~", groupvar))
quantvar_sym <- sym(quantvar)
shapiro <- shapiro_test(data, !!quantvar_sym) %>% print()
}
sample_data <- tibble::tibble(sample = letters[1:5], mean = 1:5)
make_kruskal_test(sample_data, "mean", "sample")
#> # A tibble: 1 x 3
#> variable statistic p
#> <chr> <dbl> <dbl>
#> 1 mean 0.987 0.967
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.