简体   繁体   English

聚类分类器和聚类策略

[英]Clustering classifier and clustering policy

I was going through the K-means algorithm in mahout and when debugging, I noticed that when creating the first clusters it does this following code: 我正在通过mahout中的K-means算法进行调试,并且在调试时,我注意到在创建第一个集群时,它将执行以下代码:

ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
ClusterClassifier prior = new ClusterClassifier(clusters, policy);
prior.writeToSeqFiles(priorClustersPath); 

I was reading the description of these classes and it was not clear for me... 我正在阅读这些课程的描述,但对我来说还不清楚。

I was wondering what is the meaning of these cluster classifier and policy? 我想知道这些群集分类器和策略的含义是什么? is it related with hierarchical clustering, centroid based clustering, distribution based clustering etc? 它与层次聚类,基于质心的聚类,基于分布的聚类等有关吗?

Because I do not know what is the benefit or the reason of using this cluster classifier and policy when using K-means mahout implementation. 因为我不知道在使用K-means mahout实现时使用此群集分类器和策略有什么好处或原因?

The implementation shares code with other variants of k-means and similar algorithms such as Canopy pre-clustering and GMM. 该实现与k均值的其他变体以及类似的算法(例如Canopy预聚类和GMM)共享代码。

These classes encode only the difference between these algorithms. 这些类仅编码这些算法之间的差异。

Mahout is not a good place to study the k-means algorithm, the implementation is quite a mess. Mahout不是研究k-means算法的好地方,实现相当混乱。 It's also slow. 也很慢 As in really really slow. 如真的真的很慢。 Most of the time, a single CPU implementation will outright beat Mahout on anything that fits into memory. 在大多数情况下,单个CPU实现会在适合内存的任何事物上击败Mahout。 Maybe even on disk of a single machine. 甚至可以在一台机器的磁盘上。 Because of all the map-reduce overhead. 由于所有的映射减少开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM