简体   繁体   English

如何在R中使用识别“树状图”类的水平树状图

[英]How to use identify to a horizontal dendrogram of class “dendrogram” in R

I am using identify to explore specific features of clusters in a dendrogram in R. Identify is working perfectly fine by using a 'hclust' object, but I need it for a horizontal dendrogram of class 'dendrogram' instead of 'hclust'. 我正在使用identify探索R中树状图中的簇的特定特征。通过使用“ hclust”对象,Identify可以正常工作,但是我需要水平树状图而不是“ hclust”的水平树状图 I have the package dendextend installed which should normally extend the functionality of identify to objects of class dendrogram and to horizontal dendrograms ( http://rpackages.ianhowson.com/cran/dendextend/man/identify.dendrogram.html ). 我安装了dendextend软件包,该软件包通常应将识别功能扩展到类树状图对象和水平树状图( http://rpackages.ianhowson.com/cran/dendextend/man/identify.dendrogram.html )。 For my specific dataset, identify is working for a vertical dendrogram (of class dendrogram), but is not working for a horizontal one. 对于我的特定数据集,identify适用于垂直树状图(类树状图),但不适用于水平树状图。 The error that I always get is: 我总是得到的错误是:

Error in rect.dendrogram(x, k = k, x = X$x, cluster = cluster[, k - 1],  : 
k must be between 2 and 10

Please find here a reproducible and simplified example: 请在此处找到一个可复制和简化的示例:

#Install packages
install.packages(c("TraMineR","dendextend"))
#Load packages
library(TraMineR)
library(dendextend)

#Create fake dataset (each row is a sequence of characters)
a <- c(rep('A',50), rep('B',50))
seqdf <- rbind(a=a, b=sample(a), c=sample(a), d=sample(a), e=sample(a), f=sample(a),g=sample(a),h=sample(a),
i=sample(a), j=rep('A',100),k=rep('B',100),l=sample(a)) 
colnames(seqdf)<- paste(rep('a',100),c(1:100),sep='') 

#Turn it into a sequence object 
seq_def <- seqdef(seqdf, 1:100, id = rownames(seqdf), xtstep = 4)

#Calculate the dissimilarity (hamming distance) between sequences 
hd <- seqdist(seq_def, method = "HAM", with.missing = TRUE)
rows<-list(rownames(seqdf),rownames(seqdf))
dimnames(hd) <- rows
#Perform Ward clustering on dissimilarity matrix hd
ward <- hclust(as.dist(hd), method = "ward.D2")     
#Dendrogram object
dend <- as.dendrogram(ward) 

#Horizontal dendrogram 
plot(dend, horiz=TRUE)
identify(dend, horiz=TRUE) # HERE IDENTIFY GIVES AN ERROR

#Vertical dendrogram
plot(dend)
identify(dend) # this works, there is no error

Hope somebody knows how to solve this problem. 希望有人知道如何解决这个问题。

Best, 最好,

This is a general behavior of the identify function (say, identify.hclust ) when you click "too close" to the edges of the screen. 当您单击“过于靠近”屏幕边缘时,这是识别功能(例如, identify.hclust )的一般行为。 You can see this if you will run (and click near the leaves): 如果您要跑步(并在树叶旁单击),可以看到以下内容:

plot(ward)
identify(ward, MAXCLUSTER = 12) 

I agree with you that this is a somewhat annoying behavior (since we don't always get to click exactly where we wanted to). 我同意您的看法,这是一种令人讨厌的行为(因为我们并不总是能够完全单击想要的位置)。 So I've added to the dendextend package a new parameter ( stop_if_out ), which is now set to FALSE by default for identify.dendrogram. 因此,我在dendextend包中添加了一个新参数stop_if_out ),对于stop_if_out ,现在默认将其设置为FALSE This means that the function would no longer stop when clicking too far outside the dendrogram. 这意味着当在树状图之外单击太多时,该功能将不再停止。 (it would for both vertical and horizontal plots) (适用于垂直和水平绘图)

It would probably take some time before I release this version to CRAN, but you can easily get access to it by using devtools and running: 将这个版本发布到CRAN可能要花一些时间,但是您可以使用devtools并运行以下命令轻松访问它:

install.packages.2 <- function (pkg) if (!require(pkg)) install.packages(pkg);
install.packages.2('devtools')
# make sure you have Rtools installed first! if not, then run:
#install.packages('installr'); install.Rtools()
devtools::install_github('talgalili/dendextend')

I hope this helps. 我希望这有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM