[英]How to color branches in R dendogram as a function of the classes in it?
I wish to visualize how well a clustering algorithm is doing (with certain distance metric).我希望可视化聚类算法的表现(具有特定的距离度量)。 I have samples and their corresponding classes.
我有样本及其相应的类。 To visualize, I cluster and I wish to color the branches of a dendrogram by the items in the cluster.
为了可视化,我聚类并且我希望通过聚类中的项目为树状图的分支着色。 The color will be the color most items in the hierarchical cluster correspond to (given by the data\\classes).
颜色将是层次聚类中大多数项目对应的颜色(由 data\\classes 给出)。
Example: If my clustering algorithm chose indexes 1,21,24 to be a certain cluster (at a certain level) and I have a csv file containing a class number in each row corresponding to lets say 1,2,1.示例:如果我的聚类算法选择索引 1,21,24 作为某个集群(在某个级别),并且我有一个 csv 文件,其中每行包含一个类号,对应于 1,2,1。 I want this edge to be coloured 1.
我希望这条边的颜色为 1。
Example Code:示例代码:
require(cluster)
suppressPackageStartupMessages(library(dendextend))
dir <- 'distance_metrics/'
filename <- 'aligned.csv'
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
my.dist <- as.dist(my.data)
real.clusters <-read.csv("clusters", header = T, row.names = 1)
clustered <- diana(my.dist)
# dend <- colour_branches(???dend, max(real.clusters)???)
plot(dend)
EDIT: another example partial code编辑:另一个示例部分代码
dir <- 'distance_metrics/' # csv in here contains a symmetric matrix
clust.dir <- "clusters/" #csv in here contains a column vector with classes
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
filename <- 'table.csv'
my.dist <- as.dist(my.data)
real.clusters <-read.csv(paste(clust.dir, filename, sep=""), header = T, row.names = 1)
clustered <- diana(my.dist)
dnd <- as.dendrogram(clustered)
Both node and edge color attributes can be set recursively on "dendrogram" objects (which are just deeply nested lists) using dendrapply
.可以使用
dendrapply
在“树状图”对象(它们只是深度嵌套的列表)上递归设置节点和边缘颜色属性。 The cluster package also features an as.dendrogram
method for "diana" class objects, so conversion between the object types is seamless. cluster包还具有用于“diana”类对象的
as.dendrogram
方法,因此对象类型之间的转换是无缝的。 Using your diana
clustering and borrowing some code from @Edvardoss iris example, you can create the colored dendrogram as follows:使用
diana
聚类并从@Edvardoss iris 示例中借用一些代码,您可以创建彩色树状图,如下所示:
library(cluster)
set.seed(999)
iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
clust <- diana(iris2)
dnd <- as.dendrogram(clust)
## Duplicate rownames aren't allowed, so we need to set the "labels"
## attributes recursively. We also label inner nodes here.
rectify_labels <- function(node, df){
newlab <- df$Species[unlist(node, use.names = FALSE)]
attr(node, "label") <- (newlab)
return(node)
}
dnd <- dendrapply(dnd, rectify_labels, df = iris2)
## Create a color palette as a data.frame with one row for each spp
uniqspp <- as.character(unique(iris$Species))
colormap <- data.frame(Species = uniqspp, color = rainbow(n = length(uniqspp)))
colormap[, 2] <- c("red", "blue", "green")
colormap
## Now color the inner dendrogram edges
color_dendro <- function(node, colormap){
if(is.leaf(node)){
nodecol <- colormap$color[match(attr(node, "label"), colormap$Species)]
attr(node, "nodePar") <- list(pch = NA, lab.col = nodecol)
attr(node, "edgePar") <- list(col = nodecol)
}else{
spp <- attr(node, "label")
dominantspp <- levels(spp)[which.max(tabulate(spp))]
edgecol <- colormap$color[match(dominantspp, colormap$Species)]
attr(node, "edgePar") <- list(col = edgecol)
}
return(node)
}
dnd <- dendrapply(dnd, color_dendro, colormap = colormap)
## Plot the dendrogram
plot(dnd)
The function you are looking for is color_brances
from the dendextend R package, using the arguments clusters and col.您正在寻找的功能是
color_brances
从dendextend [R包,使用参数集群和山坳。 Here is an example (based on Shaun Wilkinson's example):这是一个示例(基于 Shaun Wilkinson 的示例):
library(cluster)
set.seed(999)
iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
clust <- diana(iris2)
dend <- as.dendrogram(clust)
temp_col <- c("red", "blue", "green")[as.numeric(iris2$Species)]
temp_col <- temp_col[order.dendrogram(dend)]
temp_col <- factor(temp_col, unique(temp_col))
library(dendextend)
dend %>% color_branches(clusters = as.numeric(temp_col), col = levels(temp_col)) %>%
set("labels_colors", as.character(temp_col)) %>%
plot
there are suspicions that misunderstood the question however I'll try to answer: from my previous objectives were rewritten by the example of iris有一些怀疑误解了这个问题,但我会尝试回答:从我以前的目标被改写为 iris 的例子
clrs <- rainbow(n = 3) # create palette
clrs <- clrs[iris$Species] # assign colors
plot(x = iris$Sepal.Length,y = iris$Sepal.Width,col=clrs) # simple test colors
# cluster
dt <- cbind(iris,clrs)
dt <- dt[sample(x = 1:150,size = 50,replace = F),] # create short dataset for visualization convenience
empty.labl <- gsub("."," ",dt$Species) # create a space vector with length of names intended for reserve place to future text labels
dst <- dist(x = scale(dt[,1:4]),method = "manhattan")
hcl <- hclust(d = dst,method = "complete")
plot(hcl,hang=-1,cex=1,labels = empty.labl, xlab = NA,sub=NA)
dt <- dt[hcl$order,] # sort rows for order objects in dendrogramm
text(x = seq(nrow(dt)), y=-.5,labels = dt$Species,srt=90,cex=.8,xpd=NA,adj=c(1,0.7),col=as.character(dt$clrs))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.