简体   繁体   English

R:如何从树状图中获得大致相同大小的簇

[英]R: How to get clusters of roughly the same size from dendrogram

I tried to group students by their interests.我试图根据学生的兴趣对他们进行分组。 The groups should have roughly the same size, even if this means that some students don't really share interests with their group members if they don't fit into any of the groups.小组应该有大致相同的规模,即使这意味着有些学生如果不适合任何小组,就不会真正与小组成员分享兴趣。

I used R's hclust() function and got a really nice dendrogram - so that works perfectly - but when I try to set clusters using cutree() , I can either adjust h (the height of the tree) or k (the desired group size).我使用了 R 的hclust()函数并得到了一个非常好的树状图 - 所以它完美地工作 - 但是当我尝试使用cutree()设置集群时,我可以调整h (树的高度)或k (所需的组大小)。 The problem is that even if I set my group size to a certain value, I get some groups that are way smaller.问题是,即使我将组大小设置为某个值,我也会得到一些更小的组。

If you look at the plotted tree, there are some students whose interests are completely different from those of the rest, so I guess that's the reason why it happens.如果你看绘制的树,有一些学生的兴趣与其他学生完全不同,所以我想这就是它发生的原因。

What I'd like to do to prevent this, is to "forbid" groups of a certain minimum size, so if there are such groups they are added to another small group or something like that.为了防止这种情况,我想要做的是“禁止”某个最小规模的组,所以如果有这样的组,它们会被添加到另一个小组或类似的东西中。 Is there an easy way to do this or do I have to write my own function to clean up a bit after the clustering?有没有一种简单的方法可以做到这一点,或者我是否必须编写自己的函数来在聚类后进行一些清理?

I found similar questions on StackOverflow (eg this one ) but they're all not flagged as answered and in the particular case I mentioned, I'm afraid I don't really get the proposed solution.我在 StackOverflow 上发现了类似的问题(例如这个),但它们都没有被标记为已回答,在我提到的特定情况下,恐怕我并没有真正得到建议的解决方案。

Thanks in advance for your input!预先感谢您的意见!

Merle梅尔

As Merle noted in a comment, the solution does not have to be based on a hierarchical clustering method.正如 Merle 在评论中指出的那样,该解决方案不必基于层次聚类方法。

You can use the function balanced_clustering() from the anticlust package to create clusters of equal size.您可以使用anticlust包中的balance_clustering balanced_clustering()函数来创建大小相等的集群。 This is an example using the iris data set:这是使用 iris 数据集的示例:

library(anticlust)

data(iris)

iris$group <- balanced_clustering(
  iris[, -5],
  K = nrow(iris) / 5 # 5 plants per group
)

The output is a vector indicating group membership.输出是指示组成员资格的向量。 For example, this is one group of similar plants:例如,这是一组相似的植物:

subset(iris, group == 1)
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species group
#> 1           5.1         3.5          1.4         0.2  setosa     1
#> 5           5.0         3.6          1.4         0.2  setosa     1
#> 8           5.0         3.4          1.5         0.2  setosa     1
#> 18          5.1         3.5          1.4         0.3  setosa     1
#> 40          5.1         3.4          1.5         0.2  setosa     1

Note that I used the four numeric criteria for clustering, not the "Species".请注意,我使用四个数字标准进行聚类,而不是“物种”。

The same can be done using anticlust::matching() where you specify the size of the groups, however:同样可以使用anticlust::matching()来完成,您可以在其中指定组的大小,但是:

matching(iris[, -5], p = 5) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM