簡體   English   中英

計算修剪的樹狀圖葉中特定元素的數量

[英]Counting the number of specific elements in a pruned dendrogram leaf

我正在進行聚類分析,我想計算修剪的樹的葉子中某個變量的出現次數。 下面是一個簡化的示例,其中被修剪的樹只有三個分支。 我現在想知道三個不同分支/葉片中As和B的數量。 我怎么能得到那些?

rm(list=ls(all=TRUE))
mylabels        <- matrix(nrow=1, ncol = 20)
mylabels[1,1:10]  <- ("A")
mylabels[1,11:20] <- ("B")
myclusterdata   <- matrix(rexp(100, rate=.1), ncol=100, nrow=20)

rownames(myclusterdata)<-mylabels
hc <- hclust(dist(myclusterdata), "ave")
memb <- cutree(hc, k = 3)
cent <- NULL
for(k in 1:3){
  cent <- rbind(cent, colMeans(myclusterdata[memb == k, , drop = FALSE]))
}

hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb))
# whole tree
plot(as.dendrogram(hc),horiz=T)
# pruned tree (only 3 branches) 
plot(as.dendrogram(hc1),horiz=T)

好,我知道了。 葉子的元素在內存中。因此,重新排列它們並將其組合在一起即可提供結果。 下面是示例代碼

rm(list=ls(all=TRUE))
mylabels        <- matrix(nrow=1, ncol = 20)
mylabels[1,1:10]  <- ("A")
mylabels[1,11:20] <- ("B")
myclusterdata   <- matrix(rexp(100, rate=.1), ncol=100, nrow=20)

rownames(myclusterdata)<-mylabels
hc <- hclust(dist(myclusterdata), "ave")
memb <- cutree(hc, k = 3)

cent <- NULL
for(k in 1:3){
  cent <- rbind(cent, colMeans(myclusterdata[memb == k, , drop = FALSE]))
}

hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb))
# whole tree
plot(as.dendrogram(hc),horiz=T)
# pruned tree (only 3 branches) 
plot(as.dendrogram(hc1),horiz=T)

# identify the percentages of A and B
var_of_interest <- levels(as.factor(names(memb)))
leaf_number <- levels(as.factor(memb))

counter <- matrix(nrow=length(leaf_number), ncol = length(var_of_interest))
for (i in seq(1:length(leaf_number))) {
   for (j in seq(1:length(var_of_interest))) {
      counter[i,j] <- length(memb[names(memb)==var_of_interest[j] & memb==leaf_number[i]])   
   }
}
counter[,2]/(counter[,1]+counter[,2])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM