简体   繁体   English

层次聚类分析帮助——树状图

[英]Hierarchical cluster analysis help - dendrogram

I made a code to generate a dendrogram as you can see in the image, using the hclust function.正如您在图像中看到的那样,我使用hclust函数编写了一个代码来生成树状图。 So, I would like help in the interpretation of this dendrogram.所以,我想帮助解释这个树状图。 Note that the locations of these points are close.请注意,这些点的位置很接近。 What does this dendrogram result I'm having mean, can you help me?我的这个树状图结果是什么意思,你能帮我吗? I would really like a more complete analysis of the generated output .我真的很想对生成的输出进行更完整的分析

library(geosphere)

Points_properties<-structure(list(Propertie=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29), Latitude = c(-24.781624, -24.775017, -24.769196, 
                                               -24.761741, -24.752019, -24.748008, -24.737312, -24.744718, -24.751996, 
                                               -24.724589, -24.8004, -24.796899, -24.795041, -24.780501, -24.763376, 
                                               -24.801715, -24.728005, -24.737845, -24.743485, -24.742601, -24.766422, 
                                               -24.767525, -24.775631, -24.792703, -24.790994, -24.787275, -24.795902, 
                                               -24.785587, -24.787558), Longitude = c(-49.937369, 
                                                                                                  -49.950576, -49.927608, -49.92762, -49.920608, -49.927707, -49.922095, 
                                                                                                  -49.915438, -49.910843, -49.899478, -49.901775, -49.89364, -49.925657, 
                                                                                                  -49.893193, -49.94081, -49.911967, -49.893358, -49.903904, -49.906435, 
                                                                                                  -49.927951, -49.939603, -49.941541, -49.94455, -49.929797, -49.92141, 
                                                                                                  -49.915141, -49.91042, -49.904772, -49.894034)), row.names = c(NA, -29L), class = c("tbl_df", "tbl", 
                                                                                                                                                                                                                        "data.frame"))

coordinates<-subset(Points_properties,select=c("Latitude","Longitude"))
plot(coordinates[,2:1])
text(x = Points_properties$Longitude,
y= Points_properties$Latitude, labels=Points_properties$Propertie, pos=2)

在此处输入图像描述

d<-distm(coordinates[,2:1])
d<-as.dist(d)
fit.average<-hclust(d,method="average")
plot(fit.average,hang=-1,cex=.8, main = "")

在此处输入图像描述

You chose to perform hierarchical clustering using average method.您选择使用average方法执行层次聚类。

According to ?hclust :根据?hclust

This function performs a hierarchical cluster analysis using a set of dissimilarities for the n objects being clustered.此函数使用一组不同点对被聚类的n 个对象执行层次聚类分析。 Initially, each object is assigned to its own cluster and then the algorithm proceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster .最初,每个对象都被分配到自己的集群,然后算法迭代地进行,在每个阶段加入两个最相似的集群,一直持续到只有一个集群 At each stage distances between clusters are recomputed在每个阶段重新计算集群之间的距离

You can follow what happens using the merge field:您可以使用merge字段跟踪发生的情况:

Row i of merge describes the merging of clusters at step i of the clustering.合并的第i行描述了聚类步骤i中聚类的合并。 If an element j in the row is negative, then observation −j was merged at this stage.如果行中的元素j为负,则在此阶段合并观察-j If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm如果j为正,则合并是在算法的(早期)阶段j形成的集群

fit.average$merge
      [,1] [,2]
 [1,]  -21  -22
 [2,]  -15    1
 [3,]  -13  -24
 [4,]   -6  -20
 [5,]   -2  -23
 [6,]  -16  -27
...

This is what you see in the dendogram:这是您在树状图中看到的内容:
在此处输入图像描述

The height on the y-axis of the dendogram represents the distance between a point and the center of the cluster it's associated to (because you use method average ).树状图 y 轴上的高度表示一个点与其关联的集群中心之间的距离(因为您使用方法average )。

  1. points 21 and 22 (which are the nearest) are merged together creating cluster 1 with their barycenter点 21 和 22(它们是最近的)合并在一起,创建具有重心的集群 1
  2. cluster 1 is merged with point 15 creating cluster 2集群 1 与点 15 合并创建集群 2
  3. ... ...

You could then call rect.clust which allows various arguments, like the number of groups k you'd like:然后,您可以调用rect.clust ,它允许各种参数,例如您想要的组k

rect.hclust(fit.average, k=3)

在此处输入图像描述

You can also use output of rect.clust to color the original points:您还可以使用rect.clust的输出为原始点着色:

groups <- rect.hclust(fit.average, k=3)
groups

#[[1]]
# [1]  5  6  7  8  9 10 17 18 19 20

#[[2]]
# [1]  1  2  3  4 15 21 22 23

#[[3]]
#  [1] 11 12 13 14 16 24 25 26 27 28 29

colors <- rep(1:length(groups),lengths(groups))
colors <- colors[order(unlist(groups))]

plot(coordinates[,2:1],col = colors)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM