简体   繁体   English

在R中应用t.test数据帧

[英]apply t.test dataframe in R

I am new to programming, and have little idea how to approach this, any help would be much appreciated. 我是编程新手,不知道如何进行编程,对您的帮助将不胜感激。

I must use a two-sample t-test to compare two sets of data, c and t , each of which is divided into 6 sub columns in the same dataframe, in Excel the data appears something like this: 我必须使用两个样本的t检验来比较两组数据ct ,每组数据在同一数据框中分为6个子列,在Excel中,数据显示如下:

name|c1|c2|c3|c4|c5|c6|t1|t2|t3|t4|t5|t6

"name" is the same for each row, but differs between rows. 每行的"name"相同,但各行之间不同。 The columns c1-t6 contain numeric values, which differ between each row and column. c1-t6列包含数值,每一行和每一列都不同。

Each row must be tested individually, comparing the c subgroup to the t subgroup. 每行必须分别进行测试,将c子组与t子组进行比较。

How would I go about doing this? 我将如何去做呢? I'm guessing a loop will be needed? 我猜需要循环吗?

Using @thelatemail's input, you would most likely do the following, illustrated here with a reproducible example. 使用@thelatemail的输入,您很可能会执行以下操作,在此以可复制的示例进行说明。 df is your data.frame and as I work with dplyr , I'll use this here too. df是您的data.frame,当我使用dplyr ,我也会在这里使用它。

require(dplyr)
df <- data.frame(
+     name = sample(letters[1:10]),
+     c1 = sample(1:10),
+     c2 = sample(1:10),
+     t1 = sample(1:10), 
+     t2 = sample(1:10))
df
   name c1 c2 t1 t2
1     i  7  3  8  2
2     h  6  4  4  8
3     g  4  6  6  5
4     b  5  1  9 10
5     a  9  5  3  7
6     j  8  9  5  3
7     d 10  8 10  4
8     c  2  2  2  1
9     e  1 10  7  6
10    f  3  7  1  9
df1 <- df %>% select(contains("c"))
df2 <- df %>% select(contains("t"))
Map(t.test, as.data.frame(df1), as.data.frame(df2))

But, I'm not entirley sure that's what you want to do, as this seems to loop the function over columns and not rows. 但是,我不确定要这样做,因为这似乎使函数在列而不是行上循环。 Thus, a bit of a hacky solution (please someone show me an easier way), I would do the following: 因此,有一点棘手的解决方案(请有人告诉我一个更简单的方法),我将执行以下操作:

require(tidyr)
df2 <- gather(df, condition, measurement, c1:t2)
df3 <- spread(df2, name, measurement)
df3$condition2 <- ifelse(grepl("c", df3$condition), "c", "t")
#check dimensions of new df3
for(i in 2:11){cat(colnames(df3)[i],'\n')
+                y <- df3[, i]
+                res <- t.test(y~df3$condition2, var.equal=T)
+                print(res)
+ }

note: I've added the var.equal=T assuming you want to do a two sample t.test() 注意:假设您想做two sample t.test()我添加了var.equal = T

I believe this gives you the t.test for your data you desire. 我相信这可以为您提供所需数据的t.test

Assuming the unpaired two group t-test , consider using the mapply function, the multivariate version of sapply which applies FUN to the first elements of each argument, the second elements, the third elements, and so on. 假设未配对的两组t检验 ,请考虑使用mapply函数,即sapply的多元版本,该版本将FUN应用于每个参数的第一个元素,第二个元素,第三个元素,依此类推。

# DF SPLIT BETWEEN EACH CONTROL AND TREATMENT
controls <- df[c(grep("c", names(df)))]        # ALL C COLS
treatments <- df[c(grep("t", names(df)))]      # ALL T COLS

# MAPPLY USING TTEST
tstats_m <- mapply(ttest, var1=controls, var2=treatments)
tstats_m <- as.data.frame(tstats_m)

# MAPPLY USING DEFINED FUNCTION TTEST
tfunc <- function(var1, var2){
            t.test(var1, var2)            
          }
tstats_m <- mapply(tfunc, var1=controls, var2=treatments)

Alternatively, below is the traditional for loop that iterates results of each test: 另外,下面是传统的for循环,它会迭代每个测试的结果:

for (i in 1:ncol(controls){
  print(paste0("Two-sample t-test c", i, " = t", i))
  print(t.test(controls[paste0("c", i)], treatments[paste0("t", i)]))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM