Python多处理行为，进程数大于内核数

Question

I am trying to use Python's multiprocessing library and would like to understand its behavior with different numbers of processes. 我正在尝试使用Python的多处理库，并希望了解其在不同数量的进程中的行为。

I hypothesized that there is no benefit from setting the processes larger than the number of cores. 我假设将进程设置为大于内核数并没有任何好处。 Contrary to the hypothesis, the experiment code below says that the computation time decreases even with processes above the number of cores (in my case, 4). 与该假设相反，下面的实验代码说，即使进程数超过核心数（在我的情况下为4），计算时间也会减少。

Can someone explain what is going on behind the scene and give some practical guide on how to set the number of processes? 有人可以解释幕后发生的事情，并提供一些有关如何设置进程数的实用指南吗？

from multiprocessing import Pool, cpu_count
import time
from datetime import datetime

My CPU counts. 我的CPU数。

cpu_count()
# 4

An experimental task, which takes approximately 0.5 second. 一个实验性任务，大约需要0.5秒。

def f(x):
    time.sleep(0.5)
    return x*x

def execute_time(processes):
    t1 = datetime.now()
    with Pool(processes) as p:
        p.map(f, list(range(36)))
    t2 = datetime.now()
    return t2 - t1

for p in range(1, 25):
    t = execute_time(p)
    print(p, ":", t)

Yielded: 墓内：

# 1 : 0:00:18.065411
# 2 : 0:00:10.051516
# 3 : 0:00:06.057016
# 4 : 0:00:04.562439
# 5 : 0:00:04.069810
# 6 : 0:00:03.173502
# 7 : 0:00:03.065977
# 8 : 0:00:03.082625
# 9 : 0:00:02.092880
# 10 : 0:00:02.090963
# 11 : 0:00:02.061613
# 12 : 0:00:01.704716
# 13 : 0:00:01.704880
# 14 : 0:00:01.615440
# 15 : 0:00:01.625117
# 16 : 0:00:01.621259
# 17 : 0:00:01.639741
# 18 : 0:00:01.236108
# 19 : 0:00:01.250113
# 20 : 0:00:01.255697
# 21 : 0:00:01.253459
# 22 : 0:00:01.260632
# 23 : 0:00:01.262124
# 24 : 0:00:01.247772

It makes sense that the function takes 18 seconds with one process (36 * 0.5 sec = 18 sec). 在一个过程中该功能需要18秒（36 * 0.5秒= 18秒），这很有意义。 So does the case with four processes (18 sec / 4 = 4.5 sec). 具有四个过程（18秒/ 4 = 4.5秒）的情况也是如此。 But I am surprised that the computation time decreases with larger number of processes. 但是令我惊讶的是，随着处理数量的增加，计算时间会减少。

Answer 1

As Michael Butscher says in the comment, it was a special behavior for sleep . 正如迈克尔·布彻（Michael Butscher）在评论中所说，这是一种特殊的sleep行为。 With a CPU intensive task, I saw the benefit is bounded at processes equaling the number of cores. 通过执行CPU密集型任务，我看到了好处是进程数量等于内核数。

def f(x):
    out = 0
    for i in range(5000000):
        out += i
    return x*x

def execute_time(processes):
    t1 = datetime.now()
    with Pool(processes) as p:
        p.map(f, list(range(36)))
    t2 = datetime.now()
    return t2 - t1

for p in range(1, 25):
    t = execute_time(p)
    print(p, ":", t)

Gets: 获取：

# 1 : 0:00:13.329320
# 2 : 0:00:07.528552
# 3 : 0:00:09.943043
# 4 : 0:00:07.756005
# 5 : 0:00:08.262304
# 6 : 0:00:07.653659
# 7 : 0:00:07.677038
# 8 : 0:00:07.591766
# 9 : 0:00:07.502283
# 10 : 0:00:07.710826
# 11 : 0:00:06.006874
# 12 : 0:00:09.720279
# 13 : 0:00:07.912836
# 14 : 0:00:07.616807
# 15 : 0:00:07.740225
# 16 : 0:00:07.721783
# 17 : 0:00:07.836259
# 18 : 0:00:07.665993
# 19 : 0:00:07.564645
# 20 : 0:00:07.653607
# 21 : 0:00:07.754377
# 22 : 0:00:07.886036
# 23 : 0:00:11.696323
# 24 : 0:00:07.674243

Python多处理行为，进程数大于内核数

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-09-22 00:41:03

Python多处理行为，进程数大于内核数

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-09-22 00:41:03

解决方案1
0 已采纳 2018-09-22 00:41:03