简体   繁体   中英

multiprocessing.Pool spawns too many threads

If I run the following python code

def dummy(t):
    A = np.random.rand(10000, 10000)
    inv = np.linalg.inv(A)
    return np.linalg.norm(inv)


if __name__ == "__main__":
    with multiprocessing.Pool(2) as pool:
        print(pool.map(dummy, range(20)))

more than the specified 2 processes are spawned, or at least it seems that way. More specifically, when I use htop to monitor the system, it shows all threads as busy, ie 100% CPU usage. I would expect that only 2 threads show full 100% usage, but perhaps that assumption is wrong.

Curiously enough, if the matrix size is increased (by a factor of 10), only the 2 specified threads are busy.

Used python version: 3.6.9 / 3.8.5. Machine: skylake server with 40 cores.

As the comment from @Booboo suggests, the example contains additional parallelism not accounted for. Most likely the numpy.linalg.inv call uses some sort of multithreaded under the hood. Therefore the assumption, that only as many hardware threads as the number of processes specified in the Pool constructor, is invalid. If the source of the additional parallelism is known and can be disabled, the expected behavior can be achieved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM