简体   繁体   中英

Performing rowwise t.test on a dataframe with unequal replicates for different observations

Say for example, I have a dataframe that has eleven columns (example screenshot attached). The first column lists all the genes and the next ten columns are measurements for control (C1-C5) and treated (T1-T5) samples. The measurements are not paired.
I want to perform rowwise t.test and add a column with p-value for each gene as a last column of the dataframe. However, as you can see in my data, I don't have all measurements for all replicates (both in control and treatment conditions) for every gene because of the way the experiment was performed. So I have several NA values in many rows.
How do I perform rowwise t.test in this dataframe without it failing because of the NA values? Thanks!

example data

As far as I know the t.test won't work with NA's . So if we do something like:

Input = ("GeneID  C1  C2  C3  C4  C5  T1  T2  T3  T4  T5
          Gene1    5  1   7   9   2   7   5   4   4   3  
          Gene2    3  6   5   NA  NA  5   1   3   NA  NA
          Gene3    2  3   NA  NA  NA  NA  1   6   NA  NA
          Gene4    3  4   5   6   NA  3   4   5   NA  NA")

df = as.data.frame(read.table(textConnection(Input), header = T, row.names = 1))
df$pval <- apply(df,1,function(x) {t.test(x[2:6],x[7:11])$p.value})

It will result with an error such as not enough 'x' observations for sure. There are two options, you can ignore NA's so for Gene2 we would have C1,C2,C3 vs T1,T2,T3 because we have only these observations. Secondly we can perform non-parametric test, which has less power but is more 'flexible'. T.test is nice but a lot of assumptions must be met. The number of samples should be rather big and equal C vs T. What is more the groups should have normal distribution are at least similar between them, it's also refers to variance... Otherwise your test will be distorted. I'll recommend something like this:

df$pval <- apply(df,1,function(x) {wilcox.test(x[2:6],x[7:11])$p.value})

      C1 C2 C3 C4 C5 T1 T2 T3 T4 T5      pval
Gene1  5  1  7  9  2  7  5  4  4  3 0.7109920
Gene2  3  6  5 NA NA  5  1  3 NA NA 0.1386406
Gene3  2  3 NA NA NA NA  1  6 NA NA 1.0000000
Gene4  3  4  5  6 NA  3  4  5 NA NA 1.0000000

Have a look here and check aviable arguments for wilcox.test() that meet the character of your data. Nevertheless keep in mind that if less measurments then the accuracy and power of the test will be worse.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM