简体   繁体   中英

Using for loop for statistics by data frame column in R

I have been trying to make a loop that takes a vector of the column names I want to use and then does a loop for statistical tests to a column that determines the group of the sample. Here is how it looks for now.

sink('df_statistics.txt')

df <- `df.tsv`

columns <- c("column1" , "column2" , "column3" , "column4")

for (x in columns) {
    wilcox.test(formula = x ~ Group, data = df)
}

sink()

When I run it I get this error:

Error in model.frame.default(formula = data ~ Group, data = df) :
variable lengths differ (found for 'Group')

My groups are determined by the numbers 1 and 2. I also tried naming them control and experimental but I keep getting the same error as above. Any suggestions?

We can use lapply

lapply(df[columns], function(x) wilcox.test(x~df$Group))

data

columns <- c("column1" , "column2")
set.seed(24)
df <- data.frame(Group = rep(1:2, each=5), column1 = rnorm(10), column2 = rnorm(10))

You can't programmatically access the columns with the original notation. Use the [[ ]] notation to select the desired column with a variable.
Try:

columns <- c("column1" , "column2" , "column3" , "column4")

for (x in columns) {
    wilcox.test(formula = df[[x]] ~ df$Group)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM