简体   繁体   中英

How do I plot dendrogram alongside distance matrix in R?

I am looking for an efficient way to plot a dendrogram obtained from a data, but alongside the corresponding distance matrix instead of the original data. I have been curious about how different papers to show this and it seems that all they do is to plot the heatmap and the dendrogram separately and process them in an image-editing software. Hopefully the following codes will make clear of what I want. Say I generate the following data and get a hierarchical clustering using Pearson's correlation as the distance measure and complete linkage as the clustering:

library(gplots)
set.seed(2)
x <- matrix(rnorm(100), nrow = 5)
dist.fn <- function(x) as.dist(1-cor(t(x)))
hclust.com <- function(x) hclust(x, method="complete")
h.ori <- heatmap.2(x, trace="none", distfun=dist.fn, hclustfun=hclust.com,dendrogram = "row",main = "Fig1")
h.ori$rowInd
# 1 3 5 4 2

在此处输入图像描述

Now I can plot the corresponding distance matrix ordering its rows and columns by the dendrogram in Fig1 as:

colfunc <- colorRampPalette(c("red", "white", "yellow")) #not really necessary
dmat <- cor(t(x))[h.ori$rowInd,h.ori$rowInd]
heatmap.2(dmat,Rowv = NULL,Colv = "Rowv",scale = 'none', 
          dendrogram='none',trace = 'none',density.info="none",
          labRow = h.ori$rowInd, labCol = h.ori$rowInd,
          col=colfunc(20))

在此处输入图像描述

Here goes my question: How do I add the dendrogram plotted in Fig1 on to the one in Fig2 (preferably along both columns and rows)? The purpose is to view the clustering as produced by the dendrogram and for Block models this would be a nice way to visualize. Also as a side question, I know how to plot heatmaps using ggplot2 library ie using geom_tile() . Is there a way to do the same things I want above using ggplot2 ?

With regards to doing this in ggplot2; I wrote a function at some point that helps with this, though it is not without flaws. It takes an hclust object and uses that to plot a dendrogram as the axis guide. First we'll grab the dendrogram from the heatmap you had before.

library(gplots)
#> Warning: package 'gplots' was built under R version 4.0.2
#> 
#> Attaching package: 'gplots'
#> The following object is masked from 'package:stats':
#> 
#>     lowess
library(ggplot2)
library(ggh4x)

set.seed(2)
x <- matrix(rnorm(100), nrow = 5)
dist.fn <- function(x) as.dist(1-cor(t(x)))
hclust.com <- function(x) hclust(x, method="complete")
h.ori <- heatmap.2(x, trace="none", distfun=dist.fn, hclustfun=hclust.com,dendrogram = "row",main = "Fig1")
h.ori$rowInd
#> [1] 1 3 5 4 2

Then we format it as an hclust object, which we then feed into the scales. The scales should (in theory) automatically sort the variables according to the clustering.

I'm just adding the dendrograms at every side of the plot, so you can choose which one you really want.

# Plot prep: making the distance and hclust objects
clust <- as.hclust(h.ori$rowDendrogram)
df <- reshape2::melt(cor(t(x)))

ggplot(df, aes(Var1, Var2, fill = value)) +
  geom_raster() +
  scale_fill_gradient2(low = "red", mid = "white", high = "yellow")+
  scale_x_dendrogram(hclust = clust) +
  scale_y_dendrogram(hclust = clust) +
  guides(
    x.sec = guide_dendro(dendro = ggdendro::dendro_data(clust), position = "top"),
    y.sec = guide_dendro(dendro = ggdendro::dendro_data(clust), position = "right")
  ) +
  coord_equal()

Caveat is still that there is no good control over the labels yet. Let me know if you run into any troubles with the function so I can maybe improve it.

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM