简体   繁体   中英

How to decide number of clusters in Hierarchical clustering

I have found a clustering pattern below in a hierarchical clustering using Ward's minimum variance in R. I empirically decided five numbers of clusters based on if characteristics of individuals makes sense. Even if I use a height (indicated by 'Cut' line in the diagram, I still get same 4 clusters, however the 5th cluster (the blue one) cut down in two more clusters.

在此处输入图片说明

Question: My question is, is it mandatory to cut the 5th cluster on a specific height, even if it doesn't make sense as per research based knowledge? Or can I decide empirically to keep 5 clusters? Does it introduce any bias in the analysis?

Clustering is subjective to a certain degree (even more so than supervised learning), since no one knows the true answer of how many clusters there are, or if they are really different enough to be put into different classes. If you think that the 5th class does not make sense based on your domain knowledge, then you can choose not to to split it into its class. Just make sure that you write this down clearly, so that people will know what you did and why.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM