简体   繁体   中英

Dendextend: Regarding how to color a dendrogram’s labels according to defined groups

I'm trying to use an awesome R-package named dendextend, to plot a dendrogram and color its branches & labels according to a set of previously defined groups. I've read your answers in Stack Overflow, and the FAQs of dendextend vignette, but I'm still not sure on how to achieve my goal.

Let's imagine I have a dataframe with a first column with the names of the individual to use for the clustering, then several columns with the factors to be analyzed, and the last column with the group information for each of the individuals (See following table).

individual  282856  282960  283275  283503  283572  283614  284015  group
pat15612    0   0   0   0   0   0   0   g2
pat38736    0   0   0   0   0   0   0   g2
pat38740    0   0   0   0   0   1   0   g2
pat38742    0   0   0   0   0   1   0   g4
pat38743    0   0   1   0   0   1   0   g3
pat38745    0   0   1   0   1   0   0   g4
pat38750    0   0   0   1   0   1   0   g4
pat38753    0   0   0   1   0   0   0   g3
pat40120    0   0   0   0   1   0   0   g4
pat40124    0   0   0   0   1   0   0   g4
pat40125    0   0   0   0   1   1   0   g4
pat40126    0   0   0   1   0   0   0   g4
pat40137    1   0   0   0   0   0   0   g4
pat40142    0   1   0   0   0   0   0   g5
pat46903    0   0   0   0   0   1   0   g1
pat67612    1   0   0   0   1   0   0   g1
pat67621    0   0   0   0   0   0   0   g2
pat67630    0   0   1   0   0   0   0   g2
pat67634    0   0   0   0   0   0   0   g5
pat67657    0   1   0   1   0   0   0   g5
pat67680    0   0   0   0   0   1   0   g5
pat67683    0   0   1   1   0   0   0   g6

How do I do to color the branches and labels representing each of the individuals based on the group they belong, even though they may cluster in different blocks?

In case this can be achieved, is there a way to define the colors assigned to each group?

I'm glad you solved this on your own. The simpler solution is to use the order_value = TRUE argument in the set function. For example:

library(dendextend)
iris2 <- iris[,-5]
rownames(iris2) <- paste(iris[,5],iris[,5],iris[,5], rownames(iris2))
dend <- iris2 %>% dist %>% hclust %>% as.dendrogram
dend <- dend %>% set("labels_colors", as.numeric(iris[,5]), order_value = TRUE) %>%
        set("labels_cex", .5)
par(mar = c(4,1,0,8))
plot(dend, horiz = T)

Will result in (as you can see, the colors of the labels is based on the other variable "Species" in the iris dataset):

在此处输入图片说明

(ps: I tripled the number of times a species appears in order to make it easier to see how the color relates to the length of the label)

I was able to do it using another package called "sparcl". I did it based on a previous post ( How to colour the labels of a dendrogram by an additional factor variable in R ).

Here is my code:

#load the dataset.....
#calculate distances
d <- dist(dataset2, method="Jaccard")
## Hierarchical cluster the data
hc <- hclust(d)
dend <- as.dendrogram(hc)
#create labels
labs=dataset$individual
#format to dendrogram
hcd = as.dendrogram(hc)                             
plot(hcd, cex=0.6)
# factor variable for colours                                  
Var = dataset$group   
# convert numbers to colours                                    
varCol = gsub("g1.*","green",Var)                        
varCol = gsub("g2.*","gold",varCol)
varCol = gsub("g3.*","pink",varCol)                        
varCol = gsub("g4.*","purple",varCol)
varCol = gsub("g5.*","blue",varCol)                        
varCol = gsub("g6.*","red",varCol)
#colour-code dendrogram branches by a factor 
library(sparcl)
ColorDendrogram(hc, y=varCol, branchlength=0.9, labels=labs,
            xlab="", ylab="", sub="")  

Finally, i managed to infere a "dendextend" package solution based on the example of this post ( How to colour the labels of a dendrogram by an additional factor variable in R ):

# install.packages("dendextend")
library(dendextend)

#load the dataset.....
dataset2<-dataset[,1:7]#same dataset as in the example

#calculate the dendrogram
dend <- as.dendrogram(hclust(dist(dataset2)))

#capture the colors from the "group" column
colors_to_use <- as.numeric(dataset$group)
colors_to_use

# sort the colors based on their order in dend:
colors_to_use <- colors_to_use[order.dendrogram(dend)]
colors_to_use

#Apply colors 
labels_colors(dend) <- colors_to_use

# Patient labels have a color based on their group
labels_colors(dend) 
plot(dend, main = "Color in labels")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM