简体   繁体   English

如何根据其中的类为 R 树状图中的分支着色?

[英]How to color branches in R dendogram as a function of the classes in it?

I wish to visualize how well a clustering algorithm is doing (with certain distance metric).我希望可视化聚类算法的表现(具有特定的距离度量)。 I have samples and their corresponding classes.我有样本及其相应的类。 To visualize, I cluster and I wish to color the branches of a dendrogram by the items in the cluster.为了可视化,我聚类并且我希望通过聚类中的项目为树状图的分​​支着色。 The color will be the color most items in the hierarchical cluster correspond to (given by the data\\classes).颜色将是层次聚类中大多数项目对应的颜色(由 data\\classes 给出)。

Example: If my clustering algorithm chose indexes 1,21,24 to be a certain cluster (at a certain level) and I have a csv file containing a class number in each row corresponding to lets say 1,2,1.示例:如果我的聚类算法选择索引 1,21,24 作为某个集群(在某个级别),并且我有一个 csv 文件,其中每行包含一个类号,对应于 1,2,1。 I want this edge to be coloured 1.我希望这条边的颜色为 1。

Example Code:示例代码:

require(cluster)
suppressPackageStartupMessages(library(dendextend))
dir <- 'distance_metrics/'
filename <- 'aligned.csv'
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
my.dist <- as.dist(my.data)
real.clusters <-read.csv("clusters", header = T, row.names = 1)
clustered <- diana(my.dist)
# dend <- colour_branches(???dend, max(real.clusters)???)
plot(dend)

EDIT: another example partial code编辑:另一个示例部分代码

dir <- 'distance_metrics/' # csv in here contains a symmetric matrix
clust.dir <- "clusters/" #csv in here contains a column vector with classes
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
filename <- 'table.csv'
my.dist <- as.dist(my.data)
real.clusters <-read.csv(paste(clust.dir, filename, sep=""), header = T, row.names = 1)
clustered <- diana(my.dist)
dnd <- as.dendrogram(clustered)

Both node and edge color attributes can be set recursively on "dendrogram" objects (which are just deeply nested lists) using dendrapply .可以使用dendrapply在“树状图”对象(它们只是深度嵌套的列表)上递归设置节点和边缘颜色属性。 The cluster package also features an as.dendrogram method for "diana" class objects, so conversion between the object types is seamless. cluster包还具有用于“diana”类对象的as.dendrogram方法,因此对象类型之间的转换是无缝的。 Using your diana clustering and borrowing some code from @Edvardoss iris example, you can create the colored dendrogram as follows:使用diana聚类并从@Edvardoss iris 示例中借用一些代码,您可以创建彩色树状图,如下所示:

library(cluster)
set.seed(999)
iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
clust <- diana(iris2)
dnd <- as.dendrogram(clust)

## Duplicate rownames aren't allowed, so we need to set the "labels"
## attributes recursively. We also label inner nodes here. 
rectify_labels <- function(node, df){
  newlab <- df$Species[unlist(node, use.names = FALSE)]
  attr(node, "label") <- (newlab)
  return(node)
}
dnd <- dendrapply(dnd, rectify_labels, df = iris2)

## Create a color palette as a data.frame with one row for each spp
uniqspp <- as.character(unique(iris$Species))
colormap <- data.frame(Species = uniqspp, color = rainbow(n = length(uniqspp)))
colormap[, 2] <- c("red", "blue", "green")
colormap

## Now color the inner dendrogram edges
color_dendro <- function(node, colormap){
  if(is.leaf(node)){
    nodecol <- colormap$color[match(attr(node, "label"), colormap$Species)]
    attr(node, "nodePar") <- list(pch = NA, lab.col = nodecol)
    attr(node, "edgePar") <- list(col = nodecol)
  }else{
    spp <- attr(node, "label")
    dominantspp <- levels(spp)[which.max(tabulate(spp))]
    edgecol <- colormap$color[match(dominantspp, colormap$Species)]
    attr(node, "edgePar") <- list(col = edgecol)
  }
  return(node)
}
dnd <- dendrapply(dnd, color_dendro, colormap = colormap)

## Plot the dendrogram
plot(dnd)

在此处输入图片说明

The function you are looking for is color_brances from the dendextend R package, using the arguments clusters and col.您正在寻找的功能是color_brances从dendextend [R包,使用参数集群和山坳。 Here is an example (based on Shaun Wilkinson's example):这是一个示例(基于 Shaun Wilkinson 的示例):

library(cluster)
set.seed(999)
iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
clust <- diana(iris2)
dend <- as.dendrogram(clust)

temp_col <- c("red", "blue", "green")[as.numeric(iris2$Species)]
temp_col <- temp_col[order.dendrogram(dend)]
temp_col <- factor(temp_col, unique(temp_col))

library(dendextend)
dend %>% color_branches(clusters = as.numeric(temp_col), col = levels(temp_col)) %>% 
   set("labels_colors", as.character(temp_col)) %>% 
   plot

在此处输入图片说明

there are suspicions that misunderstood the question however I'll try to answer: from my previous objectives were rewritten by the example of iris有一些怀疑误解了这个问题,但我会尝试回答:从我以前的目标被改写为 iris 的例子

clrs <- rainbow(n = 3) # create palette
clrs <- clrs[iris$Species] # assign colors
plot(x = iris$Sepal.Length,y = iris$Sepal.Width,col=clrs) # simple test colors
# cluster
dt <- cbind(iris,clrs)
dt <- dt[sample(x = 1:150,size = 50,replace = F),] # create short dataset for visualization convenience
empty.labl <- gsub("."," ",dt$Species) # create a space vector with length of names intended for  reserve place to future text labels
dst <- dist(x = scale(dt[,1:4]),method = "manhattan")
hcl <- hclust(d = dst,method = "complete")
plot(hcl,hang=-1,cex=1,labels = empty.labl, xlab = NA,sub=NA)
dt <- dt[hcl$order,] # sort rows for  order objects in dendrogramm
text(x = seq(nrow(dt)), y=-.5,labels = dt$Species,srt=90,cex=.8,xpd=NA,adj=c(1,0.7),col=as.character(dt$clrs))

结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM