简体   繁体   English

标签和颜色叶子树状图

[英]Label and color leaf dendrogram

I am trying to create a dendrogram, were my samples have 5 group codes (act as sample name/species/etc but its repetitive). 我正在尝试创建一个树形图,我的样本有5个组代码(作为样本名称/种类/等,但重复)。

Therefore, I have two issues that a help will be great: 因此,我有两个问题,帮助将是伟大的:

  • How can I show the group codes in leaf label (instead of the sample number)? 如何在叶标签中显示组代码(而不是样本编号)?

  • I wish to assign a color to each code group and colored the leaf label according to it (it might happen that they will not be in the same clade and by that I can find more information)? 我希望为每个代码组分配一种颜色并根据它对叶子标签着色(可能会发生它们不在同一个分支中,我可以找到更多信息)?

Is it possible to do so with my script to do so (ape or ggdendro): 是否可以使用我的脚本执行此操作(ape或ggdendro):

sample<-read.table("C:/.../DOutput.txt", header=F, sep="")
groupCodes <- sample[,1]
sample2<-sample[,2:100] 
d <- dist(sample2, method = "euclidean")  
fit <- hclust(d, method="ward")
plot(as.phylo(fit), type="fan") 
ggdendrogram(fit, theme_dendro=FALSE)  

A random dataframe to replace my read.table: 随机数据框替换我的read.table:

sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("A",25), rep("B",25), rep("C",25), rep("D",25)) # fixed error
sample2 <- data.frame(cbind(groupCodes), sample) 

Here is a solution for this question using a new package called " dendextend ", built exactly for this sort of thing. 以下是使用名为“ dendextend ”的新软件包解决此问题的方法,该软件包完全针对此类内容构建。

You can see many examples in the presentations and vignettes of the package, in the "usage" section in the following URL: https://github.com/talgalili/dendextend 您可以在包的演示文稿和插图中看到许多示例,位于以下URL的“用法”部分: https//github.com/talgalili/dendextend

Here is the solution for this question: (notice the importance of how to re-order the colors to first fit the data, and then to fit the new order of the dendrogram) 以下是此问题的解决方案:(注意如何重新排序颜色以首先拟合数据,然后适应树形图的新顺序的重要性)

####################
## Getting the data:

sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("Cont",25), rep("Tre1",25), rep("Tre2",25), rep("Tre3",25))
rownames(sample) <- make.unique(groupCodes)

colorCodes <- c(Cont="red", Tre1="green", Tre2="blue", Tre3="yellow")

distSamples <- dist(sample)
hc <- hclust(distSamples)
dend <- as.dendrogram(hc)

####################
## installing dendextend for the first time:

install.packages('dendextend')

####################
## Solving the question:

# loading the package
library(dendextend)
# Assigning the labels of dendrogram object with new colors:
labels_colors(dend) <- colorCodes[groupCodes][order.dendrogram(dend)]
# Plotting the new dendrogram
plot(dend)


####################
## A sub tree - so we can see better what we got:
par(cex = 1)
plot(dend[[1]], horiz = TRUE)

在此输入图像描述

You could convert you hclust object into a dendrogram and use ?dendrapply to modify the properties (attributes like color, label, ...) of each node, eg: 您可以将hclust对象转换为dendrogram并使用?dendrapply来修改每个节点的属性(如颜色,标签等属性),例如:

## stupid toy example
samples <- matrix(c(1, 1, 1,
                    2, 2, 2,
                    5, 5, 5,
                    6, 6, 6), byrow=TRUE, nrow=4)

## set sample IDs to A-D
rownames(samples) <- LETTERS[1:4]

## perform clustering
distSamples <- dist(samples)
hc <- hclust(distSamples)

## function to set label color
labelCol <- function(x) {
  if (is.leaf(x)) {
    ## fetch label
    label <- attr(x, "label") 
    ## set label color to red for A and B, to blue otherwise
    attr(x, "nodePar") <- list(lab.col=ifelse(label %in% c("A", "B"), "red", "blue"))
  }
  return(x)
}

## apply labelCol on all nodes of the dendrogram
d <- dendrapply(as.dendrogram(hc), labelCol)

plot(d)

在此输入图像描述

EDIT: Add code for your minimal example: 编辑:为您的最小示例添加代码:

    sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("A",25), rep("B",25), rep("C",25), rep("D",25))

## make unique rownames (equal rownames are not allowed)
rownames(sample) <- make.unique(groupCodes)

colorCodes <- c(A="red", B="green", C="blue", D="yellow")


## perform clustering
distSamples <- dist(sample)
hc <- hclust(distSamples)

## function to set label color
labelCol <- function(x) {
  if (is.leaf(x)) {
    ## fetch label
    label <- attr(x, "label")
    code <- substr(label, 1, 1)
    ## use the following line to reset the label to one letter code
    # attr(x, "label") <- code
    attr(x, "nodePar") <- list(lab.col=colorCodes[code])
  }
  return(x)
}

## apply labelCol on all nodes of the dendrogram
d <- dendrapply(as.dendrogram(hc), labelCol)

plot(d)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM