简体   繁体   English

multiprocessing.Pool 未使用 M1 Mac 中的所有内核

[英]multiprocessing.Pool not using all the cores in M1 Mac

Here is my code:这是我的代码:

from multiprocessing.dummy import Pool
def process_board(elems):
  # do something
for _ in range(1000):
  with Pool(cpu_count()) as p:
    _ = p.map(process_board, enumerate(some_array))

and this is the activity monitor of my mac while the code is running:这是代码运行时我的 mac 的活动监视器: 活动监视器

I can ensure that len(some_array) > 1000 , so there is for sure more work that can be distributed, but seems not the case... what am I missing?我可以确保len(some_array) > 1000 ,所以肯定有更多的工作可以分发,但似乎并非如此……我错过了什么?

Update :更新
I tried chunking them, to see if there is any difference:我尝试将它们分块,看看是否有任何区别:

# elements per chunk -> time taken
# 100 -> 31.9 sec
# 50 -> 31.8 sec
# 20 -> 31.6 sec
# 10 -> 32 sec
# 5  -> 32 sec

consider that I have around 1000 elements, so 100 elements per chunk means 10 chunks, and this is my CPU loads during the tests:考虑到我有大约 1000 个元素,所以每个块 100 个元素意味着 10 个块,这是我在测试期间的 CPU 负载: 在此处输入图像描述

As you can see, changing the number of chunks does not help to use the last 4 CPUS...如您所见,更改块数无助于使用最后 4 个 CPUS...

You were using multiprocessing.dummy.Pool which is a thread pool that looks like a multiprocessing pool.您使用的是multiprocessing.dummy.Pool ,它是一个看起来像多处理池的线程池。 This is good for I/O tasks that release the GIL but has no advantage with CPU bound tasks.这对释放 GIL 的 I/O 任务很有用,但对 CPU 绑定任务没有优势。 To note, the python Global Interpreter Lock (GIL) ensures that only a single thread can execute byte code at a time.需要注意的是,python 全局解释器锁 (GIL) 确保一次只有一个线程可以执行字节码。

Whether multiprocessing speeds things up depends on the cost of sending data to and from the worker subprocesses verses the amount of work done on the data.多处理是否加快速度取决于将数据发送到工作子进程和从工作子进程发送数据的成本与对数据完成的工作量的关系。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM