k = 2的Kmeans算法給出相同的簇大小輸出

Question

我正在使用改良的勞埃德（Lloyd）算法，以k = 2的kmeans獲得相等的簇大小輸出。 以下是偽代碼：

- Randomly choose 2 points as initialization for the 2 clusters (denoted as c1, c2)
- Repeat below steps until convergence
    - Sort all points xi according to ascending values of ||xi-c1|| - ||xi-c2||, i.e. differences in distances to the first and the second cluster
    - Put top 50% points in cluster 1 , others in cluster 2
    - Recalculate centroids as average of the allocated points (as usual in Lloyd's)

現在上述算法憑經驗對我來說很好用：

它提供了平衡的群集
總是降低目標

以前是否有文獻提出或分析過這種算法？ 我可以得到一些參考嗎？

Answer 1

此處介紹了適用於2個以上集群的更通用版本：

https://elki-project.github.io/tutorial/same-size_k_means

我在文獻中多次見過k均值具有各種大小限制的情況，但是手頭沒有任何參考。 我不相信這一點：強迫群集具有相同的大小與找到最小二乘最佳近似IMHO的k-均值思想相矛盾，因為這意味着有意選擇一個較差的近似。

k = 2的Kmeans算法給出相同的簇大小輸出

問題描述

1 個解決方案

解決方案1
2 2017-05-15 06:41:37

k = 2的Kmeans算法給出相同的簇大小輸出

問題描述

1 個解決方案

解決方案1 2 2017-05-15 06:41:37

解決方案1
2 2017-05-15 06:41:37