简体   繁体   中英

Python Using Multiple Cores Without Me Asking

I am running a double nested loop over i,j and I use sklearn's PCA function inside the inner loop. Though I am not using any parallel processing packages, the task manager is telling me that all my CPU's are running between 80%-100%. I am pleasantly surprised by this, and have 2 questions:

1) What is going on here? How did python decide to use multiple CPU's? How is it breaking up the loop? Printing out the i,j values, they are still being completed in order.

2) Would the code be sped even more up by explicitly parallelizing it with a package, or will the difference be negligible?

"Several scikit-learn tools... rely internally on Python's multiprocessing module to parallelize execution onto several Python processes by passing n_jobs > 1 as argument."

One explanation, therefore, is that somewhere in your code n_jobs is a valid argument for an sklearn process. I'm a bit confused though, because only the specialized PCA tools have that argument in the docs.

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html (No n_jobs )

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.KernelPCA.html (Has n_jobs )

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.MiniBatchSparsePCA.html (Has n_jobs )

Numpy may also be the culprit, but you would have to dig into the implementation a bit to begin examining where sklearn is making use of numpy parallel tools.

Sklearn has a landing page specifically for optimizing existing sklearn tools (and writing your own tools.) They offer a variety of suggestions and specifically mention joblib . Check it out

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM