聚类分类器和聚类策略

Question

I was going through the K-means algorithm in mahout and when debugging, I noticed that when creating the first clusters it does this following code: 我正在通过mahout中的K-means算法进行调试，并且在调试时，我注意到在创建第一个集群时，它将执行以下代码：

ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
ClusterClassifier prior = new ClusterClassifier(clusters, policy);
prior.writeToSeqFiles(priorClustersPath);

I was reading the description of these classes and it was not clear for me... 我正在阅读这些课程的描述，但对我来说还不清楚。

I was wondering what is the meaning of these cluster classifier and policy? 我想知道这些群集分类器和策略的含义是什么？ is it related with hierarchical clustering, centroid based clustering, distribution based clustering etc? 它与层次聚类，基于质心的聚类，基于分布的聚类等有关吗？

Because I do not know what is the benefit or the reason of using this cluster classifier and policy when using K-means mahout implementation. 因为我不知道在使用K-means mahout实现时使用此群集分类器和策略有什么好处或原因？

Answer 1

The implementation shares code with other variants of k-means and similar algorithms such as Canopy pre-clustering and GMM. 该实现与k均值的其他变体以及类似的算法（例如Canopy预聚类和GMM）共享代码。

These classes encode only the difference between these algorithms. 这些类仅编码这些算法之间的差异。

Mahout is not a good place to study the k-means algorithm, the implementation is quite a mess. Mahout不是研究k-means算法的好地方，实现相当混乱。 It's also slow. 也很慢 As in really really slow. 如真的真的很慢。 Most of the time, a single CPU implementation will outright beat Mahout on anything that fits into memory. 在大多数情况下，单个CPU实现会在适合内存的任何事物上击败Mahout。 Maybe even on disk of a single machine. 甚至可以在一台机器的磁盘上。 Because of all the map-reduce overhead. 由于所有的映射减少开销。

聚类分类器和聚类策略

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-03-26 10:02:23

聚类分类器和聚类策略

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-03-26 10:02:23

解决方案1
0 已采纳 2014-03-26 10:02:23