简体   繁体   English

多处理池大小 - cpu_count 或 cpu_count/2?

[英]Multiprocessing pool size - cpu_count or cpu_count/2?

I'm running python scripts that do batch data processing on fairly large aws instances (48 or 96 vCPU).我正在运行 python 脚本,这些脚本在相当大的 aws 实例(48 或 96 个 vCPU)上进行批处理数据。 multiprocessing.Pool() works nicely: the workers have minimal communication with the main process (take a file path, return True/False). multiprocessing.Pool() 工作得很好:工作人员与主进程的通信最少(获取文件路径,返回 True/False)。 I/O and memory don't seem to be limiting. I/O 和 memory 似乎没有限制。

I've had variable performance where sometimes the best speed comes from pool size = number of vCPU, sometimes number of vCPU/2, and sometimes vCPU*some multiple around 2-4.我的性能参差不齐,有时最佳速度来自池大小 = vCPU 的数量,有时是 vCPU/2 的数量,有时是 vCPU * 2-4 左右的倍数。 These are for different kinds of jobs, on different instances, so it would be hard to benchmark all of them.这些适用于不同类型的工作,在不同的情况下,因此很难对所有这些工作进行基准测试。

Is there a rule of thumb for what size pool to use?对于使用什么大小的池,是否有经验法则?

PS multiprocessing.cpu_count() returns a number that seems to be equal to the number of vCPU. PS multiprocessing.cpu_count() 返回一个似乎等于 vCPU 数量的数字。 If that is consistent, I'd like to pick some reasonable multiple of cpu_count and just leave it at that.如果这是一致的,我想选择一些合理的 cpu_count 倍数,然后就这样。

The reason for those numbers:这些数字的原因:

  1. number of vCPU: It is reasonable, we use all the cores. vCPU 数量:这是合理的,我们使用所有内核。
  2. number of vCPU/2: It is also reasonable, as sometimes we have double logical cores compares to the physical cores. vCPU/2 的数量:这也是合理的,因为与物理核心相比,有时我们有双逻辑核心。 But logical cores won't actually speed your program up, so we just use vCPU/2.但是逻辑内核实际上不会加速你的程序,所以我们只使用 vCPU/2。
  3. vCPU*some multiple around 2-4: It is reasonable for some IO-intensive tasks. vCPU*2-4 左右的倍数:对于一些 IO 密集型任务是合理的。 For these kinds of tasks, the process is not occupying the core all the time, so we can schedule some other tasks during IO operations.对于这类任务,进程并不是一直占用内核的,所以我们可以在 IO 操作期间调度一些其他任务。

So now let's analyze the situation, I guess you are running on a server which might be a VPS.所以现在让我们分析一下情况,我猜你正在运行在可能是 VPS 的服务器上。 In this case, there is no difference between logical cores and physical cores, because vCPU is just an abstract computation resource provided by the VPS provider.在这种情况下,逻辑核和物理核没有区别,因为 vCPU 只是 VPS 提供者提供的抽象计算资源。 You cannot really touch the underlying physical cores.你不能真正触及底层的物理核心。

If your main process is not computation-intensive, or let's say it is just a simple controller, then you don't need to allocate a whole core for it, which means you don't need to minus one.如果您的主进程不是计算密集型的,或者假设它只是一个简单的 controller,那么您不需要为其分配整个内核,这意味着您不需要减去一个。

Based on your situation, I would like to suggest the number of vCPU.根据您的情况,我想建议 vCPU 的数量。 But you still need to decide based on the real situation you meet.但是你仍然需要根据你遇到的实际情况来决定。 The critical rule is:关键规则是:

Maximize resource usage(use as many cores as you can), minimize resource competition(Too many processes will compete for the resource, which will slow the whole program down).最大化资源使用(使用尽可能多的内核),最小化资源竞争(太多的进程会竞争资源,这会减慢整个程序的速度)。

There are many rules-of-thumb that you may follow, depending on the task as you already figured out您可能会遵循许多经验法则,具体取决于您已经弄清楚的任务

  • Number of physical cores物理核心数
  • Number of logical cores逻辑核心数
  • Number of phyiscal or logical cores minus one (supposedly reserving one core for the logic and control)物理或逻辑核心数减一(假设为逻辑和控制保留一个核心)

To avoid counting logical cores instead of physical ones, I suggest using the psutil library:为了避免计算逻辑核心而不是物理核心,我建议使用psutil库:

import psutil
psutil.cpu_count(logical=False)

As for what using in the end, for numerically intensive applications I tend to go with the number of physical cores.至于最终使用什么,对于数字密集型应用程序,我倾向于 go 与物理内核的数量。 Bear in mind that some BLAS implementations use multithreading by default, which may hurt a lot the scalability of data-parallel pipelines.请记住,一些 BLAS 实现默认使用多线程,这可能会严重损害数据并行管道的可扩展性。 Use MKL_NUM_THREADS=1 or OPENBLAS_NUM_THREADS=1 (depending on your BLAS backend) as environment variables whenever doing batch processing and you should have quasi-linear speedups w.r.t.在进行批处理时使用MKL_NUM_THREADS=1OPENBLAS_NUM_THREADS=1 (取决于您的 BLAS 后端)作为环境变量,并且您应该具有准线性加速 w.r.t。 the number of physical cores.物理核心数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM