I understand that HAC has several options in terms of linkage functions. You have:
What I'm trying to figure out is, how do I know which one of these I want to use? Are there certain datasets where "straggly" clusters are preferable to spherical ones? Or is it more a function of what I intend to do with the clustering data?
It depends on your data.
Single-linkage works reasonably well on clean data.
If you have dirty data, the other linkages may be better.
Ward is similar to k-means. It may be a good choice if you want to talk about centroids and data partitioned completely into disjoint subsets.
The other problem is that only SLINK (for single-linkabe) is fast. All the others usually work in O(n^3) so they are not usable on large data sets. Compare this to eg DBSCAN which runs in O(n log n) if done well, or kmeans in O(n)...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.