简体   繁体   中英

QQ plot in r from tassel pipeline

For my GWAS analysis I am using the tassel pipeline. In my GWAS I am studying two correlated traits. I want to plot a Q_Q plot for two trait in one plot like the one which we can obtain from tassel Program来自流苏程序的QQ图 .

Any one has any suggestion with which package of r I can do that? With qq() command from qqman package I plot QQ plot in seprate plot but I want a plot which involved my two traits as i did in Tassel

Ay suggestion?

A QQ-Plot in your case compares quantiles of the empirical distribution of your result to quantiles of the distribution that you'd expect theoretically if the null hypothesis is true.

If you have n data points, it makes sense to compare the n -quantiles, because then the actual quantiles of your empirical distribution are just your data points, ordered.

The theoretical distribution of p -values is the uniform distribution. Think of it, that's exactly the reason why they exist. If a measurement is assigned for example a p -value of 0.05, you'd expect this or a more extreme measurement by pure chance (null hypothesis) in only 5% of your experiments, if you repeat that experiment very often. A measurement with p=0.5, is expected in 50% of the cases. So, generalizing to any value p , your cumulative distribution function

CDF( p ) = P[measurement with p-value of ≤ p ] = p .

Look in Wikipedia, that's the CDF for the uniform distribution between 0 and 1.

Therefore, the expected n -quantiles for your QQ-Plot are {1/ n , 2/ n , ... n / n }. (They represent the case that the null hypothesis is true)

So, now we have the theoretical quantiles (x-axis) and the actual quantiles. In R code, this is something like

expected_quantiles <- function(pvalues){
  n = length(pvalues)
  actual_quantiles = sort(pvalues)
  expected_quantiles = seq_along(pvalues)/n 
  data.frame(expected = expected_quantiles, actual = actual_quantiles)
}

You can take the -log10 of these values and plot them, for example like so

testdata1 <- c(runif(98,0,1), 1e-4, 2e-5)
testdata2 <- c(runif(96,0,1), 1e-3, 2e-3, 2e-4)

qq <- lapply(list(d1 = testdata1, d2 = testdata2), expected_quantiles)
xlim <- rev(-log10(range(rbind(qq$d1, qq$d2)$expected))) * c(1, 1.1)
ylim <- rev(-log10(range(rbind(qq$d1, qq$d2)$actual))) * c(1, 1.1)

plot(NULL, xlim = xlim, ylim = ylim)
points(x = -log10(qq$d1$expected) ,y = -log10(qq$d1$actual), col = "red")
points(x = -log10(qq$d2$expected) ,y = -log10(qq$d2$actual), col = "blue")
abline(a = 0, b = 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM