简体   繁体   English

以k表示选择簇数

[英]Choosing number of clusters in k means

I want to cluster a large sample of data and for it I am using k means function in MATLAB. 我想聚类大量数据,为此我在MATLAB中使用了k均值函数。 The problem is that it returns a matrix with all the data sorted in the number of clusters I specify. 问题是它返回一个矩阵,其中所有数据都按我指定的簇数排序。

How can I know which number of clusters is optimal. 我怎么知道哪个簇是最佳的。

I thought that if I would get the equal number of elements in each cluster that would be optimal but this never happens. 我以为,如果我在每个群集中得到相等数量的元素,那将是最佳选择,但这永远不会发生。 Rather it can go on clustering the data for any number I put. 相反,它可以继续对我输入的任何数字进行数据聚类。

Please help... 请帮忙...

I read and I think an answer to this could be :- In kmeans we are trying to partition the data according to the means as the data comes so theoretically our best dataset would be where each partition has equal number of data. 我读了一下,我认为对此的答案可能是:-在kmeans中,我们试图根据数据出现时的方式对数据进行分区,因此从理论上讲,我们最好的数据集将是每个分区具有相等数量的数据。

I used kmeans++ which was a better algorithm than kmeans because it does not initialise a random value and then iterated over the number of partitions till the sizes of partitions were almost equal. 我使用kmeans ++是一种比kmeans更好的算法,因为它不初始化随机值,然后遍历分区的数量直到分区的大小几乎相等。 This was an approximate figure as say for 3 i got 2180,729,1219 and for 4 i was getting 30,2422, 1556,120 so I chose 3 as my final answer............ 这是一个大概的数字,比如说3我得到2180,729,1219,而4我得到30,2422,1556,120,所以我选择3作为我的最终答案.......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM