简体   繁体   中英

Does scipy's kmeans2 algorithm also weigh initial centroids set when using minit='matrix'?

I was playing around with scipy's kmeans2 algorithm until I noticed a problem. Consider the following code:

x = np.array([[0.1, 0.0], [0.0, 0.1], [1.1, 1.0], [1.0, 1.1]])
c = np.array([[3,3], [4, 4]])

kmeans2(x, c, minit = 'matrix', iter=100)

You'd expect this code (rather deviously) to just converge to a solution with the following centroids: [0.05, 0.05] and [1.05, 1.05] . However, the code returns this:

 (array([[ 0.55,  0.55],
   [ 4.  ,  4.  ]]), array([0, 0, 0, 0], dtype=int32))

It seems like the k-means algorithm takes its initial centroids into account when finding the new centroids. Why is this? How can I prevent this from happening?

I haven't really worked on this for a while but I randomly got this Eureka-moment in which I figured out why my problem was occuring:
Although the results seem kinda strange, if you look at how k-means works, these results are actually easy to explain: in the first epoch of k-means, the four data points are all assigned to the [3, 3] centroid, because that centroid is closest to all data points. The mean of the data points is [ 0.55, 0.55] . No matter how many epochs you do after, the centroid initialised as [3, 3] will stay the same (because it's not 'attracted' to any other data points, there aren't any) and the other centroid (initialised as [4, 4] ) will stay put because none of the data points are closer to this centroid than to the other. That's it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM