简体   繁体   English

multiprocessing.Pool 产生太多线程

[英]multiprocessing.Pool spawns too many threads

If I run the following python code如果我运行以下 python 代码

def dummy(t):
    A = np.random.rand(10000, 10000)
    inv = np.linalg.inv(A)
    return np.linalg.norm(inv)


if __name__ == "__main__":
    with multiprocessing.Pool(2) as pool:
        print(pool.map(dummy, range(20)))

more than the specified 2 processes are spawned, or at least it seems that way.产生了超过指定的 2 个进程,或者至少看起来是这样。 More specifically, when I use htop to monitor the system, it shows all threads as busy, ie 100% CPU usage.更具体地说,当我使用htop监控系统时,它显示所有线程都处于忙碌状态,即 100% 的 CPU 使用率。 I would expect that only 2 threads show full 100% usage, but perhaps that assumption is wrong.我希望只有 2 个线程显示完全 100% 的使用率,但也许这个假设是错误的。

Curiously enough, if the matrix size is increased (by a factor of 10), only the 2 specified threads are busy.奇怪的是,如果矩阵大小增加(增加 10 倍),则只有 2 个指定的线程处于忙碌状态。

Used python version: 3.6.9 / 3.8.5.二手 python 版本:3.6.9 / 3.8.5。 Machine: skylake server with 40 cores.机器:40核的skylake服务器。

As the comment from @Booboo suggests, the example contains additional parallelism not accounted for.正如@Booboo 的评论所暗示的那样,该示例包含未考虑的额外并行性。 Most likely the numpy.linalg.inv call uses some sort of multithreaded under the hood. numpy.linalg.inv调用很可能在后台使用了某种多线程。 Therefore the assumption, that only as many hardware threads as the number of processes specified in the Pool constructor, is invalid.因此,只有与Pool构造函数中指定的进程数一样多的硬件线程的假设是无效的。 If the source of the additional parallelism is known and can be disabled, the expected behavior can be achieved.如果额外并行的来源已知并且可以禁用,则可以实现预期的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM