Using for loop for statistics by data frame column in R

Question

I have been trying to make a loop that takes a vector of the column names I want to use and then does a loop for statistical tests to a column that determines the group of the sample. Here is how it looks for now.

sink('df_statistics.txt')

df <- `df.tsv`

columns <- c("column1" , "column2" , "column3" , "column4")

for (x in columns) {
    wilcox.test(formula = x ~ Group, data = df)
}

sink()

When I run it I get this error:

Error in model.frame.default(formula = data ~ Group, data = df) :
variable lengths differ (found for 'Group')

My groups are determined by the numbers 1 and 2. I also tried naming them control and experimental but I keep getting the same error as above. Any suggestions?

Answer 1

We can use lapply

lapply(df[columns], function(x) wilcox.test(x~df$Group))

data

columns <- c("column1" , "column2")
set.seed(24)
df <- data.frame(Group = rep(1:2, each=5), column1 = rnorm(10), column2 = rnorm(10))

Answer 2

You can't programmatically access the columns with the original notation. Use the [[ ]] notation to select the desired column with a variable.
Try:

columns <- c("column1" , "column2" , "column3" , "column4")

for (x in columns) {
    wilcox.test(formula = df[[x]] ~ df$Group)
}

Using for loop for statistics by data frame column in R

Question

2 answers

solution1
1 2016-06-02 02:23:38

data

solution2
0 2016-06-02 02:10:44

Using for loop for statistics by data frame column in R

Question

2 answers

solution1 1 2016-06-02 02:23:38

data

solution2 0 2016-06-02 02:10:44

solution1
1 2016-06-02 02:23:38

solution2
0 2016-06-02 02:10:44