简体   繁体   中英

dramatic slow down using multiprocess and numpy in python

I write a python code for Q-learning algorithm and I have to run it multiple times since this algorithm has random output. Thus I use multiprocessing module. The structure of the code is as follows

import numpy as np
import scipy as sp
import multiprocessing as mp
# ...import other modules...

# ...define some parameters here...

# using multiprocessing
result = []
num_threads = 3
pool = mp.Pool(num_threads)
for cnt in range(num_threads):
    args = (RL_params+phys_params) # arguments
    result.append(pool.apply_async(Q_learning, args))

pool.close()
pool.join()

There is no I/O operation in my code and my work station has 6 cores (12 threads) and enough memory for this job. When I run the code with num_threads=1 , it takes me only 13 seconds and this mission only occupies 1 thread with CPU usage 100% (using top command).

click to see picture of CPU status

However, if I run it with num_threads=3 (or more), it shall takes more than 40 seconds and this mission will occupy 3 threads with each thread use 100% CPU core.

click to see picture of CPU status

I can't understand this slowing down because there is no parallelization in all self-defined functions and no I/O operation. It is also interesting to notice that when num_threads=1 , CPU usage is always less than 100%, but when num_threads is larger than 1, CPU usage may sometimes be 101% or 102%.

On the other hand, I wrote another simple test file which does not import numpy and scipy, then this problem never show. I have noticed this question why isn't numpy.mean multithreaded? and it seem my problem is due to the automatic parallelization of some methods in numpy (such dot ). But as I shown in the pictures, I can't see any parallelization when I run a single job.

When you use a multiprocessing pool, all the arguments and results get sent through pickle . This can be very processor-intensive and time-consuming. That could be the source of your problem, especially if your arguments and/or results are large. In those cases, Python may spend more time pickling and unpickling the data than it does running computations.

However, numpy releases the global interpreter lock during computations, so if your work is numpy-intensive, you may be able to speed it up by using threading instead of multiprocessing. That would avoid the pickling step. See here for more details: https://stackoverflow.com/a/38775513/3830997

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM