在python中使用多进程和numpy的速度急剧下降

Question

I write a python code for Q-learning algorithm and I have to run it multiple times since this algorithm has random output. 我为Q学习算法编写了python代码，由于该算法具有随机输出，因此必须多次运行。 Thus I use multiprocessing module. 因此，我使用multiprocessing模块。 The structure of the code is as follows 代码的结构如下

import numpy as np
import scipy as sp
import multiprocessing as mp
# ...import other modules...

# ...define some parameters here...

# using multiprocessing
result = []
num_threads = 3
pool = mp.Pool(num_threads)
for cnt in range(num_threads):
    args = (RL_params+phys_params) # arguments
    result.append(pool.apply_async(Q_learning, args))

pool.close()
pool.join()

There is no I/O operation in my code and my work station has 6 cores (12 threads) and enough memory for this job. 我的代码中没有I / O操作，并且我的工作站有6个核心（12个线程）和足够的内存来完成此工作。 When I run the code with num_threads=1 , it takes me only 13 seconds and this mission only occupies 1 thread with CPU usage 100% (using top command). 当我使用num_threads=1运行代码时，只用了13秒，此任务仅占用1个线程，CPU使用率为100％（使用top命令）。

click to see picture of CPU status 点击查看CPU状态图

However, if I run it with num_threads=3 (or more), it shall takes more than 40 seconds and this mission will occupy 3 threads with each thread use 100% CPU core. 但是，如果我以num_threads=3 （或更多）运行它，则将花费40秒钟以上的时间，并且此任务将占用3个线程，每个线程使用100％CPU内核。

click to see picture of CPU status 点击查看CPU状态图

I can't understand this slowing down because there is no parallelization in all self-defined functions and no I/O operation. 我无法理解这种速度下降的原因，因为所有自定义函数都没有并行化，也没有I / O操作。 It is also interesting to notice that when num_threads=1 , CPU usage is always less than 100%, but when num_threads is larger than 1, CPU usage may sometimes be 101% or 102%. 还有趣的是，当num_threads=1 ，CPU使用率始终小于100％，但是当num_threads大于1时，CPU使用率有时可能为101％或102％。

On the other hand, I wrote another simple test file which does not import numpy and scipy, then this problem never show. 另一方面，我编写了另一个简单的测试文件，该文件不导入numpy和scipy，因此此问题从不显示。 I have noticed this question why isn't numpy.mean multithreaded? 我已经注意到了这个问题，为什么numpy.mean不是多线程的？ and it seem my problem is due to the automatic parallelization of some methods in numpy (such dot ). 似乎我的问题是由于numpy中某些方法的自动并行化（例如dot ）。 But as I shown in the pictures, I can't see any parallelization when I run a single job. 但是，正如我在图片中所显示的，运行单个作业时看不到任何并行化。

Answer 1

When you use a multiprocessing pool, all the arguments and results get sent through pickle . 使用多处理池时，所有参数和结果都将通过pickle发送。 This can be very processor-intensive and time-consuming. 这可能是非常耗费处理器时间的。 That could be the source of your problem, especially if your arguments and/or results are large. 这可能是问题的根源，尤其是当您的论点和/或结果很大时。 In those cases, Python may spend more time pickling and unpickling the data than it does running computations. 在这些情况下，Python可能比在运行计算上花费更多的时间来进行数据的选取和取消选取。

However, numpy releases the global interpreter lock during computations, so if your work is numpy-intensive, you may be able to speed it up by using threading instead of multiprocessing. 但是， numpy在计算过程中会释放全局解释器锁，因此，如果您的工作是numpy密集型的，则可以通过使用线程而不是多处理来加快它的速度。 That would avoid the pickling step. 这样可以避免酸洗步骤。 See here for more details: https://stackoverflow.com/a/38775513/3830997 请参阅此处以获取更多详细信息： https : //stackoverflow.com/a/38775513/3830997

在python中使用多进程和numpy的速度急剧下降

问题描述

1 个解决方案

解决方案1
1 2017-11-19 18:29:32

在python中使用多进程和numpy的速度急剧下降

问题描述

1 个解决方案

解决方案1 1 2017-11-19 18:29:32

解决方案1
1 2017-11-19 18:29:32