简体繁体 English

Python在不询问我的情况下使用多核

[英]Python Using Multiple Cores Without Me Asking

原文 2018-12-28 19:32:25 8 1 python/ multithreading/ scikit-learn/ multiprocessing

I am running a double nested loop over i,j and I use sklearn's PCA function inside the inner loop. 我正在i,j运行双嵌套循环i,j并且在内部循环内使用sklearn的PCA函数。 Though I am not using any parallel processing packages, the task manager is telling me that all my CPU's are running between 80%-100%. 尽管我没有使用任何并行处理程序包，但任务管理器告诉我，我所有的CPU都在80％-100％之间运行。 I am pleasantly surprised by this, and have 2 questions: 我对此感到惊讶，并有两个问题：

1) What is going on here? 1）这是怎么回事？ How did python decide to use multiple CPU's? python如何决定使用多个CPU？ How is it breaking up the loop? 如何打破循环？ Printing out the i,j values, they are still being completed in order. 打印出i,j值，它们仍在按顺序完成。

2) Would the code be sped even more up by explicitly parallelizing it with a package, or will the difference be negligible? 2）通过将其与程序包显式并行化，会加快代码的速度，还是可以忽略不计？

1 个解决方案

"Several scikit-learn tools... rely internally on Python's multiprocessing module to parallelize execution onto several Python processes by passing n_jobs > 1 as argument." “几个scikit-learn工具...内部通过传递n_jobs> 1作为参数，在内部依靠Python的多处理模块将执行并行化到多个Python进程上。”

One explanation, therefore, is that somewhere in your code n_jobs is a valid argument for an sklearn process. 因此，一种解释是，代码中的某处n_jobs是sklearn进程的有效参数。 I'm a bit confused though, because only the specialized PCA tools have that argument in the docs. 但是我有点困惑，因为只有专用的PCA工具在文档中有该参数。

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html (No n_jobs ) https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html （否n_jobs ）

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.KernelPCA.html (Has n_jobs ) https://scikit-learn.org/stable/modules/generation/sklearn.decomposition.KernelPCA.html （具有n_jobs ）

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.MiniBatchSparsePCA.html (Has n_jobs ) https://scikit-learn.org/stable/modules/generation/sklearn.decomposition.MiniBatchSparsePCA.html （具有n_jobs ）

Numpy may also be the culprit, but you would have to dig into the implementation a bit to begin examining where sklearn is making use of numpy parallel tools. Numpy可能也是罪魁祸首，但是您必须深入研究实现才能开始检查sklearn在哪里使用numpy并行工具。

Sklearn has a landing page specifically for optimizing existing sklearn tools (and writing your own tools.) They offer a variety of suggestions and specifically mention joblib . Sklearn的着陆页专门用于优化现有sklearn工具（以及编写您自己的工具）。它们提供了各种建议，并特别提及joblib 。 Check it out 看看这个