简体   繁体   中英

How does sklearn.cluster.KMeans handle an init ndarray parameter with missing centroids (available centroids less than n_clusters)?

In Python sklearn KMeans ( see documentation ), I was wondering what happens internally when passing an ndarray of shape (n, n_features) to the init parameter, When n<n_clusters

  1. Does it drop the given centroids and just starts a kmeans++ initialization which is the default choice for the init parameter ? ( PDF paper kmeans++ ) ( How does Kmeans++ work )
  2. Does it consider the given centroids and fill accordingly the remaining centroids using kmeans++ ?
  3. Does it consider the given centroids and fill the remaining centroids using random values ?

I didn't expect that this method returns no warning in this case. That's why I need to know how it manages this.

If you give it a mismatching init it will adjust the number of clusters, as you can see from the source . This is not documented and I would consider it a bug. I'll propose to fix it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM