简体   繁体   中英

Partial Row Labels Heatmap - R

I was wondering if anyone knows of a package that allows partial row labeling of heatmaps. I am currently using pheatmap() to construct my heatmaps, but I can use any package that has this functionality.

I have plots with many rows of differentially expressed genes and I would like to label a subset of them. There are two main things to consider (that I can think of):

  • The placement of the text annotation depends on the height of the row. If the rows are too narrow, then the text label will be ambiguous without some sort of pointer.
  • If multiple adjacent rows are significant (ie will be labelled), then these will need to be offset, and again, a pointer will be needed.

Below is an example of a partial solution that really only gets maybe halfway there, but I hope illustrates what I'd like to be able to do.

set.seed(1)
require(pheatmap)
require(RColorBrewer)
require(grid)

### Data to plot
data_mat <- matrix(sample(1:10000, 300), nrow = 50, ncol = 6)
rownames(data_mat) <- paste0("Gene", 1:50)
colnames(data_mat) <- c(paste0("A", 1:3), paste0("B", 1:3))

### Set how many genes to annotate
  ### TRUE - make enough labels that some overlap
  ### FALSE - no overlap
tooMany <- T

### Select a few genes to annotate
if (tooMany) {
  sigGenes_v <- paste0("Gene", c(5,20,26,42,47,16,28))
  newMain_v <- "Too Many Labels"
} else {
  sigGenes_v <- paste0("Gene", c(5,20,26,42))
  newMain_v <- "OK Labels"
}

### Make color list
colors_v <- brewer.pal(8, "Dark2")
colors_v <- colors_v[c(1:length(sigGenes_v), 8)]
names(colors_v) <- c(sigGenes_v, "No")
annColors_lsv <- list("Sig" = colors_v)

### Column Metadata
colMeta_df <- data.frame(Treatment = c(rep("A", 3), rep("B", 3)),
                      Replicate = c(rep(1:3, 2)),
                      stringsAsFactors = F, 
                      row.names = colnames(data_mat))

### Row metadata
rowMeta_df <- data.frame(Sig = rep("No", 50), 
                      stringsAsFactors = F,
                      row.names = rownames(data_mat))
for (gene_v in sigGenes_v) rowMeta_df[rownames(rowMeta_df) == gene_v, "Sig"] <- gene_v

### Heatmap
heat <- pheatmap(data_mat,
                 annotation_row = rowMeta_df,
                 annotation_col = colMeta_df,
                 annotation_colors = annColors_lsv,
                 cellwidth = 10,
                 main = "Original Heat")

### Get order of genes after clustering
genesInHeatOrder_v <- heat$tree_row$labels[heat$tree_row$order]
whichSigInHeatOrder_v <- which(genesInHeatOrder_v %in% sigGenes_v)
whichSigInHeatOrderLabels_v <- genesInHeatOrder_v[whichSigInHeatOrder_v]

sigY <- 1 - (0.02 * whichSigInHeatOrder_v)

### Change title
whichMainGrob_v <- which(heat$gtable$layout$name == "main")
heat$gtable$grobs[[whichMainGrob_v]] <- textGrob(label = newMain_v, 
                                                 gp = gpar(fontsize = 16))

### Remove rows
whichRowGrob_v <- which(heat$gtable$layout$name == "row_names")
heat$gtable$grobs[[whichRowGrob_v]] <- textGrob(label = whichSigInHeatOrderLabels_v,
                                                y = sigY,
                                                vjust = 1)
grid.newpage()
grid.draw(heat)

Here are a few outputs:

original heatmap: 原始热图

ok labels: 好的标签

ok labels, with flags: 好的标签,带标志

too many labels标签太多

too many labels, with flags标签太多,带有标志

The "with flags" outputs are the desired final results.
I just saved these as images from the Rstudio plot viewer. I recognize that I could save them as pdfs and provide a larger file size to get rid of the label overlap, but then the individual cells would be larger than I want.

Based on your code, you seem fairly comfortable with gtables & grobs. A (relatively) straightforward way to achieve the look you want is to zoom in on the row label grob, & make some changes there:

  1. replace unwanted labels with "";
  2. evenly spread out labels within the available space;
  3. add line segments joining the old and new label positions.

I wrote a wrapper function for this, which works as follows:

# heat refers to the original heatmap produced from the pheatmap() function
# kept.labels should be a vector of labels you wish to show
# repel.degree is a number in the range [0, 1], controlling how much the
# labels are spread out from one another

add.flag(heat,
         kept.labels = sigGenes_v,
         repel.degree = 0)

add.flag(heat,
         kept.labels = sigGenes_v,
         repel.degree = 0.5)

add.flag(heat,
         kept.labels = sigGenes_v,
         repel.degree = 1)

阴谋

Function (explanations in annotations):

add.flag <- function(pheatmap,
                     kept.labels,
                     repel.degree) {

  # repel.degree = number within [0, 1], which controls how much 
  #                space to allocate for repelling labels.
  ## repel.degree = 0: spread out labels over existing range of kept labels
  ## repel.degree = 1: spread out labels over the full y-axis

  heatmap <- pheatmap$gtable

  new.label <- heatmap$grobs[[which(heatmap$layout$name == "row_names")]] 

  # keep only labels in kept.labels, replace the rest with ""
  new.label$label <- ifelse(new.label$label %in% kept.labels, 
                            new.label$label, "")

  # calculate evenly spaced out y-axis positions
  repelled.y <- function(d, d.select, k = repel.degree){
    # d = vector of distances for labels
    # d.select = vector of T/F for which labels are significant

    # recursive function to get current label positions
    # (note the unit is "npc" for all components of each distance)
    strip.npc <- function(dd){
      if(!"unit.arithmetic" %in% class(dd)) {
        return(as.numeric(dd))
      }

      d1 <- strip.npc(dd$arg1)
      d2 <- strip.npc(dd$arg2)
      fn <- dd$fname
      return(lazyeval::lazy_eval(paste(d1, fn, d2)))
    }

    full.range <- sapply(seq_along(d), function(i) strip.npc(d[i]))
    selected.range <- sapply(seq_along(d[d.select]), function(i) strip.npc(d[d.select][i]))

    return(unit(seq(from = max(selected.range) + k*(max(full.range) - max(selected.range)),
                    to = min(selected.range) - k*(min(selected.range) - min(full.range)), 
                    length.out = sum(d.select)), 
                "npc"))
  }
  new.y.positions <- repelled.y(new.label$y,
                                d.select = new.label$label != "")
  new.flag <- segmentsGrob(x0 = new.label$x,
                           x1 = new.label$x + unit(0.15, "npc"),
                           y0 = new.label$y[new.label$label != ""],
                           y1 = new.y.positions)

  # shift position for selected labels
  new.label$x <- new.label$x + unit(0.2, "npc")
  new.label$y[new.label$label != ""] <- new.y.positions

  # add flag to heatmap
  heatmap <- gtable::gtable_add_grob(x = heatmap,
                                   grobs = new.flag,
                                   t = 4, 
                                   l = 4
  )

  # replace label positions in heatmap
  heatmap$grobs[[which(heatmap$layout$name == "row_names")]] <- new.label

  # plot result
  grid.newpage()
  grid.draw(heatmap)

  # return a copy of the heatmap invisibly
  invisible(heatmap)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM