I have a problem with coloring some genes to specify common genes in 2 data sets(whole_colon/ volcano). The code below works well. However, the thing is that I'd like to add some more detail which is quite tricky.
I would like to apply different colors(red would be great) for common genes: only when this statement is satisfied: (whole_colon$genes==volcano$genes). I tried to differentiate groups into (specified_increased/ specified_decreased) yet, sadly didn't work out.
Here's my code attached.
Big thanks in advance.
#volcano plot using ggplot2
library(data.table)
# Adding group to decipher if the gene is significant or not:
whole_colon <- data.frame(whole_colon)
whole_colon["group"] <- "NotSignificant"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] > 1.5),"group"] <- "colon_Increased_specialized"
whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] < -1.5),"group"] <- "colon_Decreased_specialized"
with(subset(whole_colon , FDR<0.05), points(logFC, -log10(FDR), pch=20,col="red"), whole_colon$genes==volcano$genes)
library(ggplot2)
ggplot(whole_colon, aes(x = logFC, y = -log10(FDR), color = group))+
scale_colour_manual(values = cols) +
ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
geom_point(size = 2.5, alpha = 1, na.rm = T) +
theme_bw(base_size = 14) +
theme(legend.position = "right") +
xlab(expression(log[2]("logFC"))) +
ylab(expression(-log[10]("FDR"))) +
geom_hline(yintercept = 1.30102, colour="#990000", linetype="dashed") +
geom_vline(xintercept = 1.5849, colour="#990000", linetype="dashed") +
geom_vline(xintercept = -1.5849, colour="#990000", linetype="dashed")+
scale_y_continuous(trans = "log1p")
This gives me an impaired image looking like this. (I want 'whole_colon data' to be fully marked whilst colored-redish when they have identical genes with 'volcano data')
Here are some data subset from whole_colon and volcano whole_colon:
genes logFC FDR group
1 CST1 9.554742 5.64e-45 Increased
3 OTOP2 -9.408177 5.76e-32 Decreased
4 COL11A1 6.825363 1.00e-31 Increased
5 INHBA 6.271879 2.07e-30 Increased
6 MMP7 7.594926 2.07e-30 Increased
7 BEST4 -7.756451 8.30e-30 Decreased
8 COL10A1 7.634386 1.82e-23 Increased
9 MMP11 4.767644 2.70e-23 Increased
10 GUCA2B -6.346156 2.17e-21 Decreased
11 KRT6B 11.801550 5.37e-20 Increased
12 WNT2 9.485133 6.47e-20 Increased
13 COL8A1 3.974965 6.47e-20 Increase
volcano:
genes logFC FDR group
1 INHBA 6.271879 2.070000e-30 Increased
2 COL10A1 7.634386 1.820000e-23 Increased
3 WNT2 9.485133 6.470000e-20 Increased
4 COL8A1 3.974965 6.470000e-20 Increased
5 THBS2 4.104176 2.510000e-19 Increased
6 BGN 3.524484 5.930000e-18 Increased
7 COMP 11.916956 2.740000e-17 Increased
9 SULF1 3.540374 1.290000e-15 Increased
10 CTHRC1 3.937028 4.620000e-14 Increased
11 TRIM29 3.827088 1.460000e-11 Increased
12 SLC6A20 5.060538 5.820000e-11 Increased
13 SFRP4 5.924330 8.010000e-11 Increased
14 CDH3 5.330732 8.940000e-11 Increased
15 ESM1 6.491496 3.380000e-10 Increased
614 TDP2 -1.801368 0.002722461 NotSignificant
615 EPHX2 -1.721039 0.002722461 NotSignificant
616 RAVER2 -1.581812 0.002749728 NotSignificant
617 BMP6 -2.702780 0.002775460 Increased
619 SCNN1G -4.012111 0.002870500 Increased
620 SLC52A3 -1.868920 0.002931197 NotSignificant
621 VIPR1 -1.556238 0.002945578 NotSignificant
622 SUCLG2 -1.720993 0.003059717 NotSignificant
The example dataset provided is incomplete, as there is no overlap so it will be quite hard to color code according to that. Try the following, the key is you cannot use ==
, but rather %in%
to return a boolean on whether your genes in whole_colon
are in volcano
:
whole_colon=structure(list(genes = structure(c(5L, 11L, 3L,
7L, 10L, 1L,
2L, 9L, 6L, 8L, 12L, 4L, 13L, 14L), .Label = c("BEST4", "COL10A1",
"COL11A1", "COL8A1", "CST1", "GUCA2B", "INHBA", "KRT6B", "MMP11",
"MMP7", "OTOP2", "WNT2", "ABC", "DEF"), class = "factor"), logFC = c(9.554742,
-9.408177, 6.825363, 6.271879, 7.594926, -7.756451, 7.634386,
4.767644, -6.346156, 11.80155, 9.485133, 3.974965, 0.5, -0.5),
FDR = c(5.64e-45, 5.76e-32, 1e-31, 2.07e-30, 2.07e-30, 8.3e-30,
1.82e-23, 2.7e-23, 2.17e-21, 5.37e-20, 6.47e-20, 6.47e-20,
1, 1), group = c("Increased", "Decreased", "Increased", "specific_Increased",
"Increased", "Decreased", "specific_Increased", "Increased",
"Decreased", "Increased", "specific_Increased", "specific_Increased",
"NotSignificant", "NotSignificant")), row.names = c("1",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"2"), class = "data.frame")
Set the groups:
#set the decreased and increased like you did:
whole_colon["group"] <- "NotSignificant"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
whole_colon[which(whole_colon['FDR'] < 0.05 & -whole_colon['logFC'] > 1.5),"group"] <- "Decreased"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Increased"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] < -1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Decreased"
and plot:
cols = c("grey","blue","blue","red","red")
names(cols) = c("NotSignificant","Increased","Decreased",
"specific_Increased","specific_Decreased")
library(ggplot2)
ggplot(whole_colon, aes(x = logFC, y = -log10(FDR), color = group))+
scale_colour_manual(values = cols) +
ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
geom_point(size = 2.5, alpha = 1, na.rm = T) +
theme_bw(base_size = 14) +
theme(legend.position = "right") +
xlab(expression(log[2]("logFC"))) +
ylab(expression(-log[10]("FDR"))) +
geom_hline(yintercept = 1.30102, colour="#990000", linetype="dashed") +
geom_vline(xintercept = 1.5849, colour="#990000", linetype="dashed") +
geom_vline(xintercept = -1.5849, colour="#990000", linetype="dashed")+
scale_y_continuous(trans = "log1p")
I think I solved this problem. Quite simply just adding one more sentence, this problem was solved. After adjusting @StupidWolf's advice and a lil redefining process of col, I got an image that I wanted.
cols<- c(red="red", orange="orange", NotSignificant="darkgrey", Increased= "#00B2FF" ,Decreased="#00B2FF", specific_Increased="#ff4d00", specific_Decreased="#ff4d00" )
head(cols)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.