简体   繁体   English

Python 多处理 - 如何使其更高效

[英]Python multiprocessing - how to make it more efficient

Consider the following two short programs.考虑以下两个短程序。

normal_test.py : normal_test.py :

import time

if __name__ == '__main__':
    t_end = time.time() + 1
    loop_iterations = 0
    while time.time() < t_end:
        loop_iterations += 1

    print(loop_iterations)

Output (on my machine):输出(在我的机器上):

4900677

mp_test.py : mp_test.py :

from multiprocessing import Process
from multiprocessing import Manager
import time


def loop1(ns):
    t_end = time.time() + 1
    while time.time() < t_end:
        ns.loop_iterations1 += 1


def loop2(ns):
    t_end = time.time() + 1
    while time.time() < t_end:
        ns.loop_iterations2 += 1


if __name__ == '__main__':
    manager = Manager()
    ns = manager.Namespace()
    ns.loop_iterations1 = 0
    ns.loop_iterations2 = 0

    p1 = Process(target=loop1, args=(ns,))
    p2 = Process(target=loop2, args=(ns,))
    p1.start()
    p2.start()

    p1.join()
    p2.join()

    print(ns.loop_iterations1)
    print(ns.loop_iterations2)

Output (on my machine):输出(在我的机器上):

5533
5527

I am hoping to use Python multiprocessing on a Raspberry Pi to read values from multiple ADCs in parallel.我希望在 Raspberry Pi 上使用 Python 多处理并行读取来自多个 ADC 的值。 As such, speed is important.因此,速度很重要。 The laptop I ran these two programs on has four cores, so I can't understand why the processes created in the second program are only able to run nearly 900 times less iterations than the single process in the first program.我运行这两个程序的笔记本电脑有四个内核,所以我不明白为什么在第二个程序中创建的进程只能比第一个程序中的单个进程少运行近 900 倍的迭代。 Am I using the Python multiprocessing library incorrectly?我是否错误地使用了 Python 多处理库? How can I make the processes faster?我怎样才能使流程更快?

Am I using the Python multiprocessing library incorrectly?我是否错误地使用了 Python 多处理库?

Incorrectly?不正确? No. Inefficiently?没有。效率低下? Yes.是的。

Remember that multiprocessing creates cooperative, but otherwise independent, instances of Python.请记住,多处理会创建协作但独立的 Python 实例。 Think of them as workers in a factory, or friends working on a big job.把他们想象成工厂里的工人,或者做一份大工作的朋友。

If only one person is working on a project, that one person is free to move about the factory floor, pick up a tool, use it, put it down, move somewhere else, pick up the next tool, and so on.如果只有一个人在做一个项目,那么这个人可以自由地在工厂车间走动,拿起工具,使用它,放下它,移动到其他地方,拿起下一个工具,等等。 Add a second person—or worse, more people, perhaps even hundreds of people—and the person must now coordinate: if some area is shared, or some tool is shared, Bob can't just go grab something, he has to ask Alice first if she's done with it.添加第二个人——或者更糟的是,更多人,甚至可能是数百人——并且这个人现在必须协调:如果某个区域被共享,或者某个工具被共享,鲍勃不能只是去拿东西,他必须问爱丽丝首先,如果她已经完成了。

A Manager object is Python multiprocessing's general wrapper for sharing. Manager对象是 Python 多处理共享的通用包装器。 Putting variables in a Manager Namespace means these are shared, so automatically check with everyone else before you use them .将变量放入管理器Namespace意味着这些是共享的,因此在使用它们之前自动与其他人核对 (More precisely, they're held in one location—one process—and accessed or changed from others via proxies.) (更准确地说,它们被保存在一个位置——一个进程——并通过代理从其他地方访问或更改。)

Here, you have done the metaphorical equivalent of replacing "Bob: count as fast as you can" with "Bob: constantly interrupt Alice to ask if she's counting, then count; Alice: count, but be constantly interrupted by Bob."在这里,您已经完成了将“鲍勃:尽可能快地数数”替换为“鲍勃:不断打断爱丽丝问她是否在数数,然后数数;爱丽丝:数数,但不断被鲍勃打断”的隐喻等价物。 Bob and Alice are now spending most, by far, of their time talking to each other, rather than counting.到目前为止,鲍勃和爱丽丝现在大部分时间都在互相交谈,而不是数数。

As the documentation says :正如文档所说

... when doing concurrent programming it is usually best to avoid using shared state as far as possible. ...在进行并发编程时,通常最好尽可能避免使用共享状态。 This is particularly true when using multiple processes.使用多个进程时尤其如此。

(it starts with the phrase "as mentioned above" but it's not mentioned above!). (它以短语“如上所述”开头,但上面没有提到!)。

There are a bunch of standard tricks, such as batching to get a lot of work done between sharing events, or using shared memory to speed up the sharing—but with shared memory you introduce the need to lock items.有很多标准技巧,例如批处理共享事件之间完成大量工作,或者使用共享内存来加速共享——但是使用共享内存,您需要锁定项目。

Looks like a better way to implement parallel processing (when shared state is not needed) is with multiprocessing Queue .看起来实现并行处理(当不需要共享状态时)的更好方法是使用 multiprocessing Queue The OP's two loops don't need a shared state. OP 的两个循环不需要共享状态。

Here are the tests.以下是测试。

Specs:眼镜:

  • Python version: 3.7.6. Python 版本:3.7.6。
  • Machine has two Intel i-9 9880H CPUs at 2.3 GHz.机器有两个 2.3 GHz 的 Intel i-9 9880H CPU。

When I executed normal_test.py in the question, I got:当我在问题中执行normal_test.py时,我得到:

$ python normal_test.py
7601322

Then I tested multiprocessing Queue as follows (two parallel processes):然后我测试了多处理Queue如下(两个并行进程):

import time
from multiprocessing import Process, Queue


def loop(n, q):
    n_iter = 0
    t_end = time.time() + 1
    while time.time() < t_end:
        n_iter += 1
    q.put((n, n_iter))


if __name__ == '__main__':
    results = []

    q = Queue()
    procs = []
    for i in range(2):
        procs.append(Process(target=loop, args=(i, q)))

    for proc in procs:
        proc.start()

    for proc in procs:
        n, loop_count = q.get()
        results.append((n, loop_count))

    for proc in procs:
        proc.join()

    del procs, q

    for r in results:
        print(r)

When I executed this, I got:当我执行这个时,我得到:

$ python multiproc2.py
(1, 10570043)
(0, 10580648)

It looks like running two processes in parallel is able to do more work than running just one.看起来并行运行两个进程比只运行一个进程能够做更多的工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM