简体   繁体   中英

scikit-learn grid search on multi-processor environment

I am able to run the scikit-learn function GridSearchCV in parallel locally on my quad-core processor. I was wondering if it is straightforward to scale this to multi-processor environments using some module for MPI such as mpi4py.

I'm very new to this, so I would appreciate any extra relevant information too. I'm going through the documentation for mpi4py right now.

Thanks!

You can have a look at the GridSearchCV implementation as an inspiration to implement your own variant on top of MPI. However MPI might not offer a natural way to avoid transfering the input training set data over the network over and over.

An alternative would be to use IPython.parallel as explained in this tutorial . The code of the pyrallel helper lib used in this tutorial is also available on github .

I extended GridSearchCV to work with MPI, have a look at http://kdw.org/node/95

Currently, it only works with supervised learning algorithms, but modifications for unsupervised should be easy. Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM