用ggplot facet_wrap覆蓋R中的不同vline

Question

我正在嘗試生成一組密度圖，以顯示四種細胞類型中兩組基因表達水平分布的差異。 除了密度圖外，我還希望將兩組的中位表達水平疊加在每個圖上。 根據一些類似問題的答案，我已經能夠獲得正確的圖或正確的中位數，但不能同時獲得兩者。 我沒有主意，希望有人能幫我找對。 謝謝！

示例數據可在此處獲取： https ： //github.com/adadiehl/sample_data/blob/master/sample.data

第一次嘗試。 產生正確的圖，但在所有四個圖上繪制相同的中位數：

dat = read.table("sample.data")

g = ggplot(dat[which(dat$FPKM > 0),], aes(x = FPKM))
g = g + geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2)
g = g + geom_vline(data=dat, aes(xintercept = median(dat$FPKM[ which(dat$FPKM > 0 & dat$class == "Other") ]) ), colour="turquoise3", linetype="longdash")
g = g + geom_vline(data=dat, aes(xintercept = median(dat$FPKM[ which(dat$FPKM > 0 & dat$class == "a_MCKG") ]) ), colour="tomato1", linetype="longdash")
g = g + facet_wrap(~source, ncol=2, scales="free")
g = g + ggtitle("Distribution of FPKM, MCKG vs. Other")
g = g + xlab("FPKM > 0")

第二次嘗試：更正圖，但將所有中位數置於所有圖上：

dat = read.table("sample.data")
vline.dat = data.frame(z=levels(dat$source), vl=tapply(dat$FPKM[which(dat$class != "a_MCKG" & dat$FPKM > 0)], dat$source[which(dat$class != "a_MCKG" & dat$FPKM > 0)], median), vm=tapply(dat$FPKM[which(dat$class == "a_MCKG" & dat$FPKM > 0)], dat$source[which(dat$class == "a_MCKG" & dat$FPKM > 0)], median))

g = ggplot(dat[which(dat$FPKM > 0),], aes(x = FPKM))
g = g + geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2)
g = g + facet_wrap(~source, ncol=2, scales="free")
g = g + geom_vline(data=vline.dat, aes(xintercept = vl), colour="turquoise3", linetype="longdash")
g = g + geom_vline(data=vline.dat, aes(xintercept = vm), colour="tomato1", linetype="longdash")
g = g + facet_wrap(~source, ncol=2, scales="free")
g = g + ggtitle("Distribution of FPKM, MCKG vs. Other")
g = g + xlab("FPKM > 0")

第三次嘗試：曲線圖都一樣，但中位數正確。

dat = read.table("sample.data")
vline.dat = data.frame(z=levels(dat$source), vl=tapply(dat$FPKM[which(dat$class != "a_MCKG" & dat$FPKM > 0)], dat$source[which(dat$class != "a_MCKG" & dat$FPKM > 0)], median), vm=tapply(dat$FPKM[which(dat$class == "a_MCKG" & dat$FPKM > 0)], dat$source[which(dat$class == "a_MCKG" & dat$FPKM > 0)], median))

g = ggplot(dat[which(dat$FPKM > 0),], aes(x = FPKM))
g = g + geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2)
g = g + facet_wrap(~source, ncol=2, scales="free")
g = g + geom_vline(data=vline.dat, aes(xintercept = vl), colour="turquoise3", linetype="longdash")
g = g + geom_vline(data=vline.dat, aes(xintercept = vm), colour="tomato1", linetype="longdash")
g = g + facet_wrap(~z, ncol=2, scales="free")
g = g + ggtitle("Distribution of FPKM, MCKG vs. Other")
g = g + xlab("FPKM > 0")

Answer 1

傳遞預先匯總的數據是一種方法：

library(plyr)

names(dat) <- c("FPKM", "class", "source")
dat2 <- subset(dat, FPKM > 0)

ggplot(dat2, aes(x = FPKM)) + 
  geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2) +
  geom_vline(data = ddply(dat2, .(source, class), summarize, mmed = median(FPKM)),
             aes(xintercept = mmed, color = class)) +
  facet_wrap(~ source, ncol = 2, scales = "free") +
  ggtitle("Distribution of FPKM, MCKG vs. Other") +
  xlab("FPKM > 0")

或者，您可以使用基數R實現相同的目的：

dat3 <- aggregate(FPKM ~ source + class, data = dat2, FUN = median)

ggplot(dat2, aes(x = FPKM)) + 
  geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2) +
  geom_vline(data = dat3,
             aes(xintercept = FPKM, color = class)) +
  facet_wrap(~ source, ncol = 2, scales = "free") +
  ggtitle("Distribution of FPKM, MCKG vs. Other") +
  xlab("FPKM > 0")

注意：您可能要避免使用諸如source和class列名，因為它們與內置函數沖突。

用ggplot facet_wrap覆蓋R中的不同vline

問題描述

1 個解決方案

解決方案1
0 已采納 2015-12-07 15:59:19

用ggplot facet_wrap覆蓋R中的不同vline

問題描述

1 個解決方案

解決方案1 0 已采納 2015-12-07 15:59:19

解決方案1
0 已采納 2015-12-07 15:59:19