I am able to run the scikit-learn function GridSearchCV in parallel locally on my quad-core processor. I was wondering if it is straightforward to scale this to multi-processor environments using some module for MPI such as mpi4py.
I'm very new to this, so I would appreciate any extra relevant information too. I'm going through the documentation for mpi4py right now.
Thanks!
You can have a look at the GridSearchCV implementation as an inspiration to implement your own variant on top of MPI. However MPI might not offer a natural way to avoid transfering the input training set data over the network over and over.
An alternative would be to use IPython.parallel as explained in this tutorial . The code of the pyrallel helper lib used in this tutorial is also available on github .
I extended GridSearchCV to work with MPI, have a look at http://kdw.org/node/95
Currently, it only works with supervised learning algorithms, but modifications for unsupervised should be easy. Hope this helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.