简体   繁体   中英

volcano plot in R: adding details: coloring common factors only

I have a problem with coloring some genes to specify common genes in 2 data sets(whole_colon/ volcano). The code below works well. However, the thing is that I'd like to add some more detail which is quite tricky.

I would like to apply different colors(red would be great) for common genes: only when this statement is satisfied: (whole_colon$genes==volcano$genes). I tried to differentiate groups into (specified_increased/ specified_decreased) yet, sadly didn't work out.

Here's my code attached.

Big thanks in advance.

    #volcano plot using ggplot2
    library(data.table)
    # Adding group to decipher if the gene is significant or not:
    whole_colon <- data.frame(whole_colon)
    whole_colon["group"] <- "NotSignificant"
    whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
    whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] > 1.5),"group"] <- "colon_Increased_specialized"
    whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] < -1.5),"group"] <- "colon_Decreased_specialized"

    with(subset(whole_colon , FDR<0.05), points(logFC, -log10(FDR), pch=20,col="red"), whole_colon$genes==volcano$genes)

    library(ggplot2)
    ggplot(whole_colon, aes(x = logFC, y = -log10(FDR), color = group))+
      scale_colour_manual(values = cols) +
      ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
      geom_point(size = 2.5, alpha = 1, na.rm = T) +
      theme_bw(base_size = 14) + 
      theme(legend.position = "right") + 
      xlab(expression(log[2]("logFC"))) + 
      ylab(expression(-log[10]("FDR"))) +
      geom_hline(yintercept = 1.30102, colour="#990000", linetype="dashed") + 
      geom_vline(xintercept = 1.5849, colour="#990000", linetype="dashed") + 
      geom_vline(xintercept = -1.5849, colour="#990000", linetype="dashed")+ 
      scale_y_continuous(trans = "log1p")

This gives me an impaired image looking like this. (I want 'whole_colon data' to be fully marked whilst colored-redish when they have identical genes with 'volcano data')

在此处输入图片说明

Here are some data subset from whole_colon and volcano whole_colon:

    genes   logFC       FDR             group   
1   CST1    9.554742    5.64e-45    Increased
3   OTOP2   -9.408177   5.76e-32    Decreased
4   COL11A1 6.825363    1.00e-31    Increased
5   INHBA   6.271879    2.07e-30    Increased
6   MMP7    7.594926    2.07e-30    Increased
7   BEST4   -7.756451   8.30e-30    Decreased
8   COL10A1 7.634386    1.82e-23    Increased
9   MMP11   4.767644    2.70e-23    Increased
10  GUCA2B  -6.346156   2.17e-21    Decreased
11  KRT6B   11.801550   5.37e-20    Increased
12  WNT2    9.485133    6.47e-20    Increased
13  COL8A1  3.974965    6.47e-20    Increase

volcano:

     genes   logFC       FDR             group    
1   INHBA   6.271879    2.070000e-30    Increased
2   COL10A1 7.634386    1.820000e-23    Increased
3   WNT2    9.485133    6.470000e-20    Increased
4   COL8A1  3.974965    6.470000e-20    Increased
5   THBS2   4.104176    2.510000e-19    Increased
6   BGN     3.524484    5.930000e-18    Increased
7   COMP    11.916956   2.740000e-17    Increased
9   SULF1   3.540374    1.290000e-15    Increased
10  CTHRC1  3.937028    4.620000e-14    Increased
11  TRIM29  3.827088    1.460000e-11    Increased
12  SLC6A20 5.060538    5.820000e-11    Increased
13  SFRP4   5.924330    8.010000e-11    Increased
14  CDH3    5.330732    8.940000e-11    Increased
15  ESM1    6.491496    3.380000e-10    Increased
614 TDP2    -1.801368   0.002722461     NotSignificant
615 EPHX2   -1.721039   0.002722461     NotSignificant
616 RAVER2  -1.581812   0.002749728     NotSignificant
617 BMP6    -2.702780   0.002775460     Increased
619 SCNN1G  -4.012111   0.002870500     Increased
620 SLC52A3 -1.868920   0.002931197     NotSignificant
621 VIPR1   -1.556238   0.002945578     NotSignificant
622 SUCLG2  -1.720993   0.003059717     NotSignificant

The example dataset provided is incomplete, as there is no overlap so it will be quite hard to color code according to that. Try the following, the key is you cannot use == , but rather %in% to return a boolean on whether your genes in whole_colon are in volcano :

whole_colon=structure(list(genes = structure(c(5L, 11L, 3L, 
7L, 10L, 1L, 
2L, 9L, 6L, 8L, 12L, 4L, 13L, 14L), .Label = c("BEST4", "COL10A1", 
"COL11A1", "COL8A1", "CST1", "GUCA2B", "INHBA", "KRT6B", "MMP11", 
"MMP7", "OTOP2", "WNT2", "ABC", "DEF"), class = "factor"), logFC = c(9.554742, 
-9.408177, 6.825363, 6.271879, 7.594926, -7.756451, 7.634386, 
4.767644, -6.346156, 11.80155, 9.485133, 3.974965, 0.5, -0.5), 
    FDR = c(5.64e-45, 5.76e-32, 1e-31, 2.07e-30, 2.07e-30, 8.3e-30, 
    1.82e-23, 2.7e-23, 2.17e-21, 5.37e-20, 6.47e-20, 6.47e-20, 
    1, 1), group = c("Increased", "Decreased", "Increased", "specific_Increased", 
    "Increased", "Decreased", "specific_Increased", "Increased", 
    "Decreased", "Increased", "specific_Increased", "specific_Increased", 
    "NotSignificant", "NotSignificant")), row.names = c("1", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
"2"), class = "data.frame")

Set the groups:

#set the decreased and increased like you did:
whole_colon["group"] <- "NotSignificant"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
whole_colon[which(whole_colon['FDR'] < 0.05 & -whole_colon['logFC'] > 1.5),"group"] <- "Decreased"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Increased"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] < -1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Decreased"

and plot:

cols = c("grey","blue","blue","red","red")
names(cols) = c("NotSignificant","Increased","Decreased",
"specific_Increased","specific_Decreased")

    library(ggplot2)
        ggplot(whole_colon, aes(x = logFC, y = -log10(FDR), color = group))+
          scale_colour_manual(values = cols) +
          ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
          geom_point(size = 2.5, alpha = 1, na.rm = T) +
          theme_bw(base_size = 14) + 
          theme(legend.position = "right") + 
          xlab(expression(log[2]("logFC"))) + 
          ylab(expression(-log[10]("FDR"))) +
          geom_hline(yintercept = 1.30102, colour="#990000", linetype="dashed") + 
          geom_vline(xintercept = 1.5849, colour="#990000", linetype="dashed") + 
          geom_vline(xintercept = -1.5849, colour="#990000", linetype="dashed")+ 
          scale_y_continuous(trans = "log1p")

#

在此处输入图片说明

I think I solved this problem. Quite simply just adding one more sentence, this problem was solved. After adjusting @StupidWolf's advice and a lil redefining process of col, I got an image that I wanted.

cols<- c(red="red", orange="orange", NotSignificant="darkgrey", Increased= "#00B2FF" ,Decreased="#00B2FF", specific_Increased="#ff4d00", specific_Decreased="#ff4d00" )
head(cols)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM