[英]Label and color leaf dendrogram
我正在尝试创建一个树形图,我的样本有5个组代码(作为样本名称/种类/等,但重复)。
因此,我有两个问题,帮助将是伟大的:
如何在叶标签中显示组代码(而不是样本编号)?
我希望为每个代码组分配一种颜色并根据它对叶子标签着色(可能会发生它们不在同一个分支中,我可以找到更多信息)?
是否可以使用我的脚本执行此操作(ape或ggdendro):
sample<-read.table("C:/.../DOutput.txt", header=F, sep="")
groupCodes <- sample[,1]
sample2<-sample[,2:100]
d <- dist(sample2, method = "euclidean")
fit <- hclust(d, method="ward")
plot(as.phylo(fit), type="fan")
ggdendrogram(fit, theme_dendro=FALSE)
随机数据框替换我的read.table:
sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("A",25), rep("B",25), rep("C",25), rep("D",25)) # fixed error
sample2 <- data.frame(cbind(groupCodes), sample)
以下是使用名为“ dendextend ”的新软件包解决此问题的方法,该软件包完全针对此类内容构建。
您可以在包的演示文稿和插图中看到许多示例,位于以下URL的“用法”部分: https : //github.com/talgalili/dendextend
以下是此问题的解决方案:(注意如何重新排序颜色以首先拟合数据,然后适应树形图的新顺序的重要性)
####################
## Getting the data:
sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("Cont",25), rep("Tre1",25), rep("Tre2",25), rep("Tre3",25))
rownames(sample) <- make.unique(groupCodes)
colorCodes <- c(Cont="red", Tre1="green", Tre2="blue", Tre3="yellow")
distSamples <- dist(sample)
hc <- hclust(distSamples)
dend <- as.dendrogram(hc)
####################
## installing dendextend for the first time:
install.packages('dendextend')
####################
## Solving the question:
# loading the package
library(dendextend)
# Assigning the labels of dendrogram object with new colors:
labels_colors(dend) <- colorCodes[groupCodes][order.dendrogram(dend)]
# Plotting the new dendrogram
plot(dend)
####################
## A sub tree - so we can see better what we got:
par(cex = 1)
plot(dend[[1]], horiz = TRUE)
您可以将hclust
对象转换为dendrogram
并使用?dendrapply
来修改每个节点的属性(如颜色,标签等属性),例如:
## stupid toy example
samples <- matrix(c(1, 1, 1,
2, 2, 2,
5, 5, 5,
6, 6, 6), byrow=TRUE, nrow=4)
## set sample IDs to A-D
rownames(samples) <- LETTERS[1:4]
## perform clustering
distSamples <- dist(samples)
hc <- hclust(distSamples)
## function to set label color
labelCol <- function(x) {
if (is.leaf(x)) {
## fetch label
label <- attr(x, "label")
## set label color to red for A and B, to blue otherwise
attr(x, "nodePar") <- list(lab.col=ifelse(label %in% c("A", "B"), "red", "blue"))
}
return(x)
}
## apply labelCol on all nodes of the dendrogram
d <- dendrapply(as.dendrogram(hc), labelCol)
plot(d)
编辑:为您的最小示例添加代码:
sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("A",25), rep("B",25), rep("C",25), rep("D",25))
## make unique rownames (equal rownames are not allowed)
rownames(sample) <- make.unique(groupCodes)
colorCodes <- c(A="red", B="green", C="blue", D="yellow")
## perform clustering
distSamples <- dist(sample)
hc <- hclust(distSamples)
## function to set label color
labelCol <- function(x) {
if (is.leaf(x)) {
## fetch label
label <- attr(x, "label")
code <- substr(label, 1, 1)
## use the following line to reset the label to one letter code
# attr(x, "label") <- code
attr(x, "nodePar") <- list(lab.col=colorCodes[code])
}
return(x)
}
## apply labelCol on all nodes of the dendrogram
d <- dendrapply(as.dendrogram(hc), labelCol)
plot(d)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.