简体   繁体   中英

cluster presentation dendrogram alternative in r

I know dendrograms are quite popular. However if there are quite large number of observations and classes it hard to follow. However sometime I feel that there should be better way to present the same thing. I got an idea but do not know how to implement it.

Consider the following dendrogram.

> data(mtcars)
> plot(hclust(dist(mtcars)))

在此输入图像描述

Can plot it like a scatter plot. In which the distance between two points is plotted with line, while sperate clusters (assumed threshold) are colored and circle size is determined by value of some variable.

在此输入图像描述

You are describing a fairly typical way of going about cluster analysis:

  • Use a clustering algorithm (in this case hierarchical clustering)
  • Decide on the number of clusters
  • Project the data in a two-dimensional plane using some form or principal component analysis

The code:

hc <- hclust(dist(mtcars))
cluster <- cutree(hc, k=3)
xy <- data.frame(cmdscale(dist(mtcars)), factor(cluster))
names(xy) <- c("x", "y", "cluster")
xy$model <- rownames(xy)

library(ggplot2)
ggplot(xy, aes(x, y)) + geom_point(aes(colour=cluster), size=3)

What happens next is that you get a skilled statistician to help explain what the x and y axes mean. This usually involves projecting the data to the axes and extracting the factor loadings.

The plot:

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM