为什么这里的多处理速度较慢？

Question

I am trying to speed up some code with multiprocessing in Python, but I cannot understand one point.我正在尝试通过 Python 中的多处理来加速一些代码，但我无法理解一点。 Assume I have the following dumb function:假设我有以下愚蠢的 function：

import time
from multiprocessing.pool import Pool

def foo(_):
    for _ in range(100000000):
        a = 3

When I run this code without using multiprocessing (see the code below) on my laptop (Intel - 8 cores cpu) time taken is ~2.31 seconds.当我在笔记本电脑（Intel - 8 核 cpu）上不使用多处理（参见下面的代码）运行此代码时，所用时间约为 2.31 秒。

t1 = time.time()
foo(1)
print(f"Without multiprocessing {time.time() - t1}")

Instead, when I run this code by using Python multiprocessing library (see the code below) time taken is ~6.0 seconds.相反，当我使用 Python 多处理库（参见下面的代码）运行此代码时，所用时间约为 6.0 秒。

pool = Pool(8)
t1 = time.time()
pool.map(foo, range(8))
print(f"Sample multiprocessing {time.time() - t1}")

From the best of my knowledge, I understand that when using multiprocessing there is some time overhead mainly caused by the need to spawn the new processes and to copy the memory state.据我所知，我知道在使用多处理时会产生一些时间开销，主要是由于需要生成新进程和复制 memory state。 However, this operation should be performed just once when the processed are initially spawned at the very beginning and should not be that huge.然而，这个操作应该在被处理的最初产生时只执行一次，并且不应该那么大。

So what I am missing here?那么我在这里缺少什么？ Is there something wrong in my reasoning?我的推理有问题吗？

Edit: I think it is better to be more explicit on my question.编辑：我认为最好在我的问题上更加明确。 What I expected here was the multiprocessed code to be slightly slower than the sequential one.我在这里期望的是多处理代码比顺序代码稍慢。 It is true that I don't split the whole work across the 8 cores, but I am using 8 cores in parallel to do the same job (hence in an ideal world the processing time should more or less stay the same).的确，我不会将整个工作分配到 8 个内核上，但我使用 8 个内核并行来完成相同的工作（因此在理想情况下，处理时间应该或多或少保持不变）。 Considering the overhead of spawning new processes, I expected a total increase in time of some (not too big) percentage, but not of a ~2.60x increase as I got here.考虑到生成新进程的开销，我预计总时间会增加一些（不太大）百分比，但不会增加约 2.60 倍。

Answer 1

Well, multiprocessing can't possibly make this faster: you're not dividing the work across 8 processes, you're asking each of 8 processes to do the entire thing.好吧，多处理不可能使这更快：您不是将工作分配给 8 个进程，而是要求 8 个进程中的每一个来完成整个事情。 Each process will take at least as long as your code doing it just once without using multiprocessing.每个进程将至少与您的代码在不使用多处理的情况下只执行一次的时间一样长。

So if multiprocessing weren't helping at all, you'd expect it to take about 8 times as long (it's doing 8x the work.) as your single-processor run.因此，如果多处理根本没有帮助，您预计它需要大约 8 倍的时间（它正在做 8 倍的工作。）作为您的单处理器运行。 But you said it's not taking 2.31 * 8 ~= 18,5 seconds.但是你说它不需要 2.31 * 8 ~= 18,5 秒。 but "only" about 6. So you're getting better than a factor of 3 speedup.但“仅”约为 6。因此，您的加速比提高了 3 倍。

Why not more than that?为什么不多呢？ Can't guess from here.从这里猜不出来。 That will depend on how many physical cores your machine has, and how much other stuff you're running at the same time.这将取决于您的机器有多少物理内核，以及您同时运行多少其他东西。 Each process will be 100% CPU-bound for this specific function, so the number of "logical" cores is pretty much irrelevant - there's scant opportunity for processor hyper-threading to help.对于这个特定的 function，每个进程都将 100% 受 CPU 限制，因此“逻辑”内核的数量几乎无关紧要——处理器超线程几乎没有机会提供帮助。 So I'm guessing you have 4 physical cores.所以我猜你有4个物理核心。

On my box在我的盒子上

Sample timing on my box, which has 8 logical cores but only 4 physical cores, and otherwise left the box pretty quiet:我的盒子上有 8 个逻辑内核但只有 4 个物理内核的采样时序，否则盒子就很安静：

Without multiprocessing 2.468580484390259
Sample multiprocessing 4.78624415397644

As above, none of that surprises me.如上所述，这并不让我感到惊讶。 In fact, I was a little surprised (but pleasantly) at how effectively the program used up the machine's true capacity.事实上，我对程序如此有效地耗尽了机器的真实容量感到有点惊讶（但令人愉快）。

Answer 2

@TimPeters already answered that you are actually just running the job 8 times across the 8 Pool subprocesses, so it is slower not faster. @TimPeters 已经回答说您实际上只是在 8 个 Pool 子进程中运行该作业 8 次，因此它更慢而不是更快。

That answers the issue but does not really answer what your real underlying question was.这回答了这个问题，但并没有真正回答你真正的潜在问题是什么。 It is clear from your surprise at this result, that you were expecting that the single job to somehow be automatically split up and run in parts across the 8 Pool processes.从您对这个结果的惊讶中可以清楚地看出，您期望单个作业以某种方式自动拆分并跨 8 个池进程部分运行。 That is not the way that it works.这不是它的工作方式。 You have to build in/tell it how to split up the work.你必须建立/告诉它如何分解工作。

Different kinds of jobs needs need to be subdivided in different ways, but to continue with your example you might do something like this:不同类型的工作需要以不同的方式细分，但要继续您的示例，您可能会执行以下操作：

import time
from multiprocessing.pool import Pool

def foo(_):
    for _ in range(100000000):
        a = 3 

def foo2(job_desc):
    start, stop = job_desc
    print(f"{start}, {stop}")

    for _ in range(start, stop):    
        a = 3 

def main():
    t1 = time.time()
    foo(1)
    print(f"Without multiprocessing {time.time() - t1}")

    pool_size = 8
    pool = Pool(pool_size)

    t1 = time.time()

    top_num = 100000000
    size = top_num // pool_size
    job_desc_list = [[size * j, size * (j+1)] for j in range(pool_size)]
    # this is in case the the upper bound is not a multiple of pool_size
    job_desc_list[-1][-1] = top_num

    pool.map(foo2, job_desc_list)
    print(f"Sample multiprocessing {time.time() - t1}")


if __name__ == "__main__":
    main()

Which results in:结果是：

Without multiprocessing 3.080709171295166
0, 12500000
12500000, 25000000
25000000, 37500000
37500000, 50000000
50000000, 62500000
62500000, 75000000
75000000, 87500000
87500000, 100000000
Sample multiprocessing 1.5312283039093018

As this shows, splitting the job up does allow it to take less time.如图所示，拆分工作确实可以减少花费的时间。 The speedup will depend on the number of CPUs.加速将取决于 CPU 的数量。 In a CPU bound job you should try to limit it the pool size to the number of CPUs.在 CPU 绑定作业中，您应该尝试将池大小限制为 CPU 数量。 My laptop has plenty more CPU's but some of the benefit is lost to the overhead.我的笔记本电脑有更多的 CPU，但一些好处却被开销所抵消。 If the jobs were longer this should look more useful.如果工作更长，这应该看起来更有用。

为什么这里的多处理速度较慢？

问题描述

2 个解决方案

解决方案1
9 已采纳 2020-05-19 21:24:47

On my box在我的盒子上

解决方案2
1 2020-05-19 23:16:40

为什么这里的多处理速度较慢？

问题描述

2 个解决方案

解决方案1 9 已采纳 2020-05-19 21:24:47

On my box在我的盒子上

解决方案2 1 2020-05-19 23:16:40

解决方案1
9 已采纳 2020-05-19 21:24:47

解决方案2
1 2020-05-19 23:16:40