简体   繁体   English

python多处理:没有收益递减?

[英]python multiprocessing: no diminishing returns?

Let's say I want to paralelize some intensive computation (not I/O bound). 假设我想要对一些密集计算(不是I / O绑定)进行并行化。

Naturally, I do not want to run more processes than available processors or I would start paying for context switching (and cache misses). 当然,我不想运行比可用处理器更多的进程,或者我会开始支付上下文切换(和缓存未命中)。

Mentally, I would expect that as I increased n in multiprocessing.Pool(n) , total time would behave like this: 在心理上,我希望当我在multiprocessing.Pool(n)增加n时,总时间会像这样:

小样

  1. negative slope as tasks take advantage of parallelization 负斜率作为任务利用并行化
  2. positive slope as context switching starts costing me 上下文切换的正斜率开始使我付出代价
  3. plateau 高原

But in actuality, I am getting this: 但实际上,我得到了这个:

真实

#!/usr/bin/env python

from math import factorial


def pi(n):
    t = 0
    pi = 0
    deno = 0
    k = 0
    for k in range(n):
        t = ((-1)**k)*(factorial(6*k))*(13591409+545140134*k)
        deno = factorial(3*k)*(factorial(k)**3)*(640320**(3*k))
        pi += t/deno
    pi = pi * 12/(640320**(1.5))
    pi = 1/pi
    return pi

import multiprocessing
import time
maxx = 20
tasks = 60
task_complexity = 500
x = range(1, maxx+1)
y = [0]*maxx

for i in x:
    p = multiprocessing.Pool(i)
    tic = time.time()
    p.map(pi, [task_complexity]*tasks)
    toc = time.time()
    y[i-1] = toc-tic
    print '%2d %ds' % (i, y[i-1])

import matplotlib.pyplot as plot
plot.plot(x, y)
plot.xlabel('Number of threads')
plot.xlim(1, maxx)
plot.xticks(x)
plot.ylabel('Time in seconds')
plot.show()

My machine: i3-3217U CPU @ 1.80GHz × 4 我的机器:i3-3217U CPU @ 1.80GHz×4

Operating system: Ubuntu 14.04 操作系统:Ubuntu 14.04

After n>4, I see the task manager rotating through the various processes, as expected since there are more processes than processors. 在n> 4之后,我看到任务管理器按照预期轮换各种进程,因为进程多于处理器。 Yet, there is no penalty relative to n=4 (my number of processors). 然而,相对于n = 4(我的处理器数量),没有惩罚。

In fact, even when n<4, I see the scheduler frenetically rotating the processes through my processors, instead of assigning each process to its own processor and avoid context switching. 实际上,即使n <4,我也看到调度程序通过我的处理器频繁地旋转进程,而不是将每个进程分配给它自己的处理器并避免上下文切换。

I am seeing this behavior using gnome-system-monitor: (Please let me know if someone has a different experience.) 我使用gnome-system-monitor看到了这种行为:(如果有人有不同的体验,请告诉我。)

GNOME系统监测

Any explanation why it does not seem to matter how many processes I fire? 任何解释为什么它似乎并不重要我开了多少个进程? Or is something wrong with my code? 或者我的代码出了什么问题?

My guess: it seems to be the case that processes are not processor-bound (even when only two processes are active, they keep switching CPU), and so I am paying for context switching anyway. 我的猜测:似乎流程不受处理器约束(即使只有两个进程处于活动状态,它们仍然会切换CPU),所以无论如何我都在为上下文切换付费。

References: 参考文献:

EDIT: updated graphic and code with higher constants. 编辑:更新的图形和代码具有更高的常量。

Answering my own question: 回答我自己的问题:

Firstly, I seem to have committed an error in my post. 首先,我似乎在我的帖子中犯了一个错误。 It does not seem true that the CPU being used gets changed frantically. 使用的CPU疯狂地改变似乎不正确。 If I fire two CPU-intensive processes, they keep changing cores but only between two cores. 如果我启动两个CPU密集型进程,它们会不断更改内核,但只能在两个内核之间进行更改 My computer has 4 cores each of which has 2 "soft" cores (for hyperthreading). 我的电脑有4个核心,每个核心有2个“软”核心(用于超线程)。 I guess what is going on is that it is changing between these 2 "soft" cores. 我想现在发生的是这两个“软”核心之间正在发生变化。 It isn't Linux doing this, it is the CPU-board. 这不是Linux,它是CPU板。

That being said, I am still surprised that context switching is not more of a pain than it is. 话虽这么说,我仍然感到惊讶的是,上下文切换并不比它更痛苦。

EDIT: There is a nice discussion, with better empirical work than me, over this blog . 编辑:这个博客上 ,有一个很好的讨论,比我更好的实证工作。

In fact, even when n<4, I see the scheduler frenetically rotating the processes through my processors, instead of assigning each process to its own processor and avoid context switching. 实际上,即使n <4,我也看到调度程序通过我的处理器频繁地旋转进程,而不是将每个进程分配给它自己的处理器并避免上下文切换。

Processes are not processor-bounded by default, one of the main reason being to avoid unequal heating of the processor, which can cause mechanical stress and reduce its lifetime. 默认情况下,进程不受处理器限制,其中一个主要原因是避免处理器的不均匀加热,这会导致机械应力并缩短其使用寿命。

There are ways to enforce running a process on a single core (look at psutil module), which has advantages such as better use of cache memory and avoid context switching, but in most cases (if not all), you don't make a big difference in terms of performances. 有一些方法可以强制在单个核心上运行一个进程(查看psutil模块),这有利于更好地使用缓存并避免上下文切换,但在大多数情况下(如果不是全部),你不会在表现方面有很大差异。

So now if spawn more processes than your number of cores, they will just act as threads and switch between them to optimize execution. 因此,如果产生的进程多于核心数量,它们将仅作为线程并在它们之间切换以优化执行。 The processor performance will only be (very) slightly lowered, as you already were switching context with less than 4 processes. 处理器性能只会(非常)略微降低,因为您已经用少于4个进程切换上下文。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM