Say for example, I have a dataframe that has eleven columns (example screenshot attached). The first column lists all the genes and the next ten columns are measurements for control (C1-C5)
and treated (T1-T5)
samples. The measurements are not paired.
I want to perform rowwise t.test and add a column with p-value for each gene as a last column of the dataframe. However, as you can see in my data, I don't have all measurements for all replicates (both in control and treatment conditions) for every gene because of the way the experiment was performed. So I have several NA values in many rows.
How do I perform rowwise t.test in this dataframe without it failing because of the NA values? Thanks!
As far as I know the t.test won't work with NA's
. So if we do something like:
Input = ("GeneID C1 C2 C3 C4 C5 T1 T2 T3 T4 T5
Gene1 5 1 7 9 2 7 5 4 4 3
Gene2 3 6 5 NA NA 5 1 3 NA NA
Gene3 2 3 NA NA NA NA 1 6 NA NA
Gene4 3 4 5 6 NA 3 4 5 NA NA")
df = as.data.frame(read.table(textConnection(Input), header = T, row.names = 1))
df$pval <- apply(df,1,function(x) {t.test(x[2:6],x[7:11])$p.value})
It will result with an error such as not enough 'x' observations
for sure. There are two options, you can ignore NA's
so for Gene2
we would have C1,C2,C3 vs T1,T2,T3
because we have only these observations. Secondly we can perform non-parametric test, which has less power but is more 'flexible'. T.test is nice but a lot of assumptions must be met. The number of samples should be rather big and equal C vs T. What is more the groups should have normal distribution are at least similar between them, it's also refers to variance... Otherwise your test will be distorted. I'll recommend something like this:
df$pval <- apply(df,1,function(x) {wilcox.test(x[2:6],x[7:11])$p.value})
C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 pval
Gene1 5 1 7 9 2 7 5 4 4 3 0.7109920
Gene2 3 6 5 NA NA 5 1 3 NA NA 0.1386406
Gene3 2 3 NA NA NA NA 1 6 NA NA 1.0000000
Gene4 3 4 5 6 NA 3 4 5 NA NA 1.0000000
Have a look here and check aviable arguments for wilcox.test()
that meet the character of your data. Nevertheless keep in mind that if less measurments then the accuracy and power of the test will be worse.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.