简体   繁体   中英

How can i distribute processing of minibatch kmeans (scikit-learn)?

In Scikit-learn , K-Means have n_jobs but MiniBatch K-Means is lacking it. MBK is faster than KMeans but at large sample sets we would like it distribute the processing across multiprocessing (or other parallel processing libraries).

Is MKB's Partial-fit the answer?

I don't think this is possible. You could implement something with OpenMP inside the minibatch processing. I'm not aware of any parallel minibatch k-means procedures. Parallizing stochastic gradient descent procedures is somewhat hairy.

Btw, the n_jobs parameter in KMeans only distributes the different random initializations afaik.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM