简体   繁体   English

对 dataframe 执行逐行 t.test,不同观测值的重复次数不相等

[英]Performing rowwise t.test on a dataframe with unequal replicates for different observations

Say for example, I have a dataframe that has eleven columns (example screenshot attached).例如,我有一个 dataframe 有 11 列(附上示例截图)。 The first column lists all the genes and the next ten columns are measurements for control (C1-C5) and treated (T1-T5) samples.第一列列出了所有基因,接下来的十列是对照(C1-C5)和处理(T1-T5)样本的测量值。 The measurements are not paired.测量未配对。
I want to perform rowwise t.test and add a column with p-value for each gene as a last column of the dataframe.我想执行逐行 t.test 并为每个基因添加一个 p 值列作为 dataframe 的最后一列。 However, as you can see in my data, I don't have all measurements for all replicates (both in control and treatment conditions) for every gene because of the way the experiment was performed.但是,正如您在我的数据中看到的那样,由于实验的执行方式,我没有对每个基因的所有重复(在控制和处理条件下)进行所有测量。 So I have several NA values in many rows.所以我在很多行中有几个 NA 值。
How do I perform rowwise t.test in this dataframe without it failing because of the NA values?如何在此 dataframe 中执行 rowwise t.test 而不会因为 NA 值而失败? Thanks!谢谢!

example data示例数据

As far as I know the t.test won't work with NA's .据我所知, t.test 不适用于NA's So if we do something like:因此,如果我们执行以下操作:

Input = ("GeneID  C1  C2  C3  C4  C5  T1  T2  T3  T4  T5
          Gene1    5  1   7   9   2   7   5   4   4   3  
          Gene2    3  6   5   NA  NA  5   1   3   NA  NA
          Gene3    2  3   NA  NA  NA  NA  1   6   NA  NA
          Gene4    3  4   5   6   NA  3   4   5   NA  NA")

df = as.data.frame(read.table(textConnection(Input), header = T, row.names = 1))
df$pval <- apply(df,1,function(x) {t.test(x[2:6],x[7:11])$p.value})

It will result with an error such as not enough 'x' observations for sure.这将导致错误,例如肯定not enough 'x' observations There are two options, you can ignore NA's so for Gene2 we would have C1,C2,C3 vs T1,T2,T3 because we have only these observations.有两种选择,您可以忽略NA's ,因此对于Gene2 ,我们将有C1,C2,C3 vs T1,T2,T3因为我们只有这些观察结果。 Secondly we can perform non-parametric test, which has less power but is more 'flexible'.其次,我们可以执行非参数测试,它的功率较小但更“灵活”。 T.test is nice but a lot of assumptions must be met. T.test 很好,但必须满足很多假设。 The number of samples should be rather big and equal C vs T. What is more the groups should have normal distribution are at least similar between them, it's also refers to variance... Otherwise your test will be distorted.样本的数量应该相当大并且相等 C vs T。更重要的是,这些组应该具有正态分布,至少它们之间是相似的,这也是指方差......否则你的测试会被扭曲。 I'll recommend something like this:我会推荐这样的东西:

df$pval <- apply(df,1,function(x) {wilcox.test(x[2:6],x[7:11])$p.value})

      C1 C2 C3 C4 C5 T1 T2 T3 T4 T5      pval
Gene1  5  1  7  9  2  7  5  4  4  3 0.7109920
Gene2  3  6  5 NA NA  5  1  3 NA NA 0.1386406
Gene3  2  3 NA NA NA NA  1  6 NA NA 1.0000000
Gene4  3  4  5  6 NA  3  4  5 NA NA 1.0000000

Have a look here and check aviable arguments for wilcox.test() that meet the character of your data. 在这里查看并检查符合您数据特征的wilcox.test()的可用 arguments。 Nevertheless keep in mind that if less measurments then the accuracy and power of the test will be worse.不过请记住,如果测量次数减少,那么测试的准确性和威力会更差。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM