简体   繁体   English

如何从并行进程中运行的函数中检索值?

[英]How to retrieve values from a function run in parallel processes?

The Multiprocessing module is quite confusing for python beginners specially for those who have just migrated from MATLAB and are made lazy with its parallel computing toolbox. 多处理模块对于python初学者来说非常困惑,特别是那些刚刚从MATLAB迁移并且使用并行计算工具箱变得懒惰的人。 I have the following function which takes ~80 Secs to run and I want to shorten this time by using Multiprocessing module of Python. 我有以下功能需要大约80秒运行,我想通过使用Python的多处理模块来缩短这个时间。

from time import time

xmax   = 100000000

start = time()
for x in range(xmax):
    y = ((x+5)**2+x-40)
    if y <= 0xf+1:
        print('Condition met at: ', y, x)
end  = time()
tt   = end-start #total time
print('Each iteration took: ', tt/xmax)
print('Total time:          ', tt)

This outputs as expected: 这按预期输出:

Condition met at:  -15 0
Condition met at:  -3 1
Condition met at:  11 2
Each iteration took:  8.667453265190124e-07
Total time:           86.67453265190125

As any iteration of the loop is not dependent on others, I tried to adopt this Server Process from the official documentation to scan chunks of the range in separate processes. 由于循环的任何迭代都不依赖于其他循环,我尝试从官方文档中采用此服务器进程来在单独的进程中扫描范围的块。 And finally I came up with vartec's answer to this question and could prepare the following code. 最后我想出了vartec对这个问题的回答,可以准备以下代码。 I also updated the code based on Darkonaut's response to the current question. 我还根据Darkonaut对当前问题的回答更新了代码。

from time import time 
import multiprocessing as mp

def chunker (rng, t): # this functions makes t chunks out of rng
    L  = rng[1] - rng[0]
    Lr = L % t
    Lm = L // t
    h  = rng[0]-1
    chunks = []
    for i in range(0, t):
        c  = [h+1, h + Lm]
        h += Lm
        chunks.append(c)
    chunks[t-1][1] += Lr + 1
    return chunks

def worker(lock, xrange, return_dict):
    '''worker function'''
    for x in range(xrange[0], xrange[1]):
        y = ((x+5)**2+x-40)
        if y <= 0xf+1:
            print('Condition met at: ', y, x)
            return_dict['x'].append(x)
            return_dict['y'].append(y)
            with lock:                
                list_x = return_dict['x']
                list_y = return_dict['y']
                list_x.append(x)
                list_y.append(y)
                return_dict['x'] = list_x
                return_dict['y'] = list_y

if __name__ == '__main__':
    start = time()
    manager = mp.Manager()
    return_dict = manager.dict()
    lock = manager.Lock()
    return_dict['x']=manager.list()
    return_dict['y']=manager.list()
    xmax = 100000000
    nw = mp.cpu_count()
    workers = list(range(0, nw))
    chunks = chunker([0, xmax], nw)
    jobs = []
    for i in workers:
        p = mp.Process(target=worker, args=(lock, chunks[i],return_dict))
        jobs.append(p)
        p.start()

    for proc in jobs:
        proc.join()
    end = time()
    tt   = end-start #total time
    print('Each iteration took: ', tt/xmax)
    print('Total time:          ', tt)
    print(return_dict['x'])
    print(return_dict['y'])

which considerably reduces the run time to ~17 Secs. 这大大减少了运行时间到~17秒。 But, my shared variable cannot retrieve any values. 但是,我的共享变量无法检索任何值。 Please help me find out which part of the code is going wrong. 请帮我找出代码的哪个部分出错了。

the output I get is: 我得到的输出是:

Each iteration took:  1.7742713451385497e-07
Total time:           17.742713451385498
[]
[]

from which I expect: 从中我期望:

Each iteration took:  1.7742713451385497e-07
Total time:           17.742713451385498
[0, 1, 2]
[-15, -3, 11]

The issue in your example is that modifications to standard mutable structures within Manager.dict will not be propagated. 您的示例中的问题是不会传播对Manager.dict标准可变结构的修改。 I'm first showing you how to fix it with manager, just to show you better options afterwards. 我首先向您展示如何与经理进行修复,以便向您展示更好的选择。

multiprocessing.Manager is a bit heavy since it uses a separate Process just for the Manager and working on a shared object needs using locks for data consistency. multiprocessing.Manager有点沉重,因为它仅为Manager使用单独的Process,并且使用锁来处理共享对象需要数据一致性。 If you run this on one machine, there are better options with multiprocessing.Pool , in case you don't have to run customized Process classes and if you have to, multiprocessing.Process together with multiprocessing.Queue would be the common way of doing it. 如果你在一台机器上运行它,那么multiprocessing.Pool有更好的选择,如果你不必运行自定义的Process类,如果必须, multiprocessing.Process .Process与multiprocessing.Queue一起是常见的做法它。

The quoting parts are from the multiprocessing docs. 引用部分来自多处理文档。


Manager 经理

If standard (non-proxy) list or dict objects are contained in a referent, modifications to those mutable values will not be propagated through the manager because the proxy has no way of knowing when the values contained within are modified. 如果标准(非代理)列表或dict对象包含在引用对象中,则对这些可变值的修改将不会通过管理器传播,因为代理无法知道何时修改其中包含的值。 However, storing a value in a container proxy (which triggers a setitem on the proxy object) does propagate through the manager and so to effectively modify such an item, one could re-assign the modified value to the container proxy... 但是,将值存储在容器代理中(在代理对象上触发setitem )确实会传播通过管理器,因此为了有效地修改这样的项,可以将修改后的值重新分配给容器代理...

In your case this would look like: 在你的情况下,这将是:

def worker(xrange, return_dict, lock):
    """worker function"""
    for x in range(xrange[0], xrange[1]):
        y = ((x+5)**2+x-40)
        if y <= 0xf+1:
            print('Condition met at: ', y, x)
            with lock:
                list_x = return_dict['x']
                list_y = return_dict['y']
                list_x.append(x)
                list_y.append(y)
                return_dict['x'] = list_x
                return_dict['y'] = list_y

The lock here would be a manager.Lock instance you have to pass along as argument since the whole (now) locked operation is not by itself atomic. 这里的lock是一个manager.Lock实例你必须作为参数传递,因为整个(现在)锁定操作本身不是原子的。 ( Here is an easier example with Manager using Lock) 这里是用一个简单的例子Manager使用锁)

This approach is perhaps less convenient than employing nested Proxy Objects for most use cases but also demonstrates a level of control over the synchronization. 对于大多数用例而言,这种方法可能不如使用嵌套代理对象方便,但也展示了对同步的控制级别。

Since Python 3.6 proxy objects are nestable: 由于Python 3.6代理对象是可嵌套的:

Changed in version 3.6: Shared objects are capable of being nested. 在版本3.6中更改:共享对象能够嵌套。 For example, a shared container object such as a shared list can contain other shared objects which will all be managed and synchronized by the SyncManager. 例如,共享容器对象(如共享列表)可以包含其他共享对象,这些对象将由SyncManager进行管理和同步。

Since Python 3.6 you can fill your manager.dict before starting multiprocessing with manager.list as values and then append directly in the worker without having to reassign. 从Python 3.6开始,您可以在使用manager.list作为值开始多处理之前填充manager.dict ,然后直接追加到worker中而无需重新分配。

return_dict['x'] = manager.list()
return_dict['y'] = manager.list()

EDIT: 编辑:

Here is the full example with Manager : 以下是Manager的完整示例:

import time
import multiprocessing as mp
from multiprocessing import Manager, Process
from contextlib import contextmanager
# mp_util.py from first link in code-snippet for "Pool"
# section below
from mp_utils import calc_batch_sizes, build_batch_ranges

# def context_timer ... see code snippet in "Pool" section below

def worker(batch_range, return_dict, lock):
    """worker function"""
    for x in batch_range:
        y = ((x+5)**2+x-40)
        if y <= 0xf+1:
            print('Condition met at: ', y, x)
            with lock:
                return_dict['x'].append(x)
                return_dict['y'].append(y)


if __name__ == '__main__':

    N_WORKERS = mp.cpu_count()
    X_MAX = 100000000

    batch_sizes = calc_batch_sizes(X_MAX, n_workers=N_WORKERS)
    batch_ranges = build_batch_ranges(batch_sizes)
    print(batch_ranges)

    with Manager() as manager:
        lock = manager.Lock()
        return_dict = manager.dict()
        return_dict['x'] = manager.list()
        return_dict['y'] = manager.list()

        tasks = [(batch_range, return_dict, lock)
                 for batch_range in batch_ranges]

        with context_timer():

            pool = [Process(target=worker, args=args)
                    for args in tasks]

            for p in pool:
                p.start()
            for p in pool:
                p.join()

        # Create standard container with data from manager before exiting
        # the manager.
        result = {k: list(v) for k, v in return_dict.items()}

    print(result)

Pool

Most often a multiprocessing.Pool will just do it. 通常是multiprocessing.Pool会这样做。 You have an additional challenge in your example since you want to distribute iteration over a range. 由于您希望在一个范围内分配迭代,因此您的示例中还有一个额外的挑战。 Your chunker function doesn't manage to divide the range even so every process has about the same work to do: 您的chunker函数无法划分范围,即使每个进程都要完成相同的工作:

chunker((0, 21), 4)
# Out: [[0, 4], [5, 9], [10, 14], [15, 21]]  # 4, 4, 4, 6!

For the code below please grab the code snippet for mp_utils.py from my answer here , it provides two functions to chunk ranges as even as possible. 对于下面的代码,请抢代码片段mp_utils.py从我的答案在这里 ,它提供了两个功能块,因为即使范围成为可能。

With multiprocessing.Pool your worker function just has to return the result and Pool will take care of transporting the result back over internal queues back to the parent process. 使用multiprocessing.Pool您的worker函数只需返回结果, Pool将负责将结果通过内部队列传回给父进程。 The result will be a list, so you will have to rearange your result again in a way you want it to have. result将是一个列表,因此您必须以您希望的方式再次重新排列结果。 Your example could then look like this: 您的示例可能如下所示:

import time
import multiprocessing as mp
from multiprocessing import Pool
from contextlib import contextmanager
from itertools import chain

from mp_utils import calc_batch_sizes, build_batch_ranges

@contextmanager
def context_timer():
    start_time = time.perf_counter()
    yield
    end_time = time.perf_counter()
    total_time   = end_time-start_time
    print(f'\nEach iteration took: {total_time / X_MAX:.4f} s')
    print(f'Total time:          {total_time:.4f} s\n')


def worker(batch_range):
    """worker function"""
    result = []
    for x in batch_range:
        y = ((x+5)**2+x-40)
        if y <= 0xf+1:
            print('Condition met at: ', y, x)
            result.append((x, y))
    return result


if __name__ == '__main__':

    N_WORKERS = mp.cpu_count()
    X_MAX = 100000000

    batch_sizes = calc_batch_sizes(X_MAX, n_workers=N_WORKERS)
    batch_ranges = build_batch_ranges(batch_sizes)
    print(batch_ranges)

    with context_timer():
        with Pool(N_WORKERS) as pool:
            results = pool.map(worker, iterable=batch_ranges)

    print(f'results: {results}')
    x, y = zip(*chain.from_iterable(results))  # filter and sort results
    print(f'results sorted: x: {x}, y: {y}')

Example Output: 示例输出:

[range(0, 12500000), range(12500000, 25000000), range(25000000, 37500000), 
range(37500000, 50000000), range(50000000, 62500000), range(62500000, 75000000), range(75000000, 87500000), range(87500000, 100000000)]
Condition met at:  -15 0
Condition met at:  -3 1
Condition met at:  11 2

Each iteration took: 0.0000 s
Total time:          8.2408 s

results: [[(0, -15), (1, -3), (2, 11)], [], [], [], [], [], [], []]
results sorted: x: (0, 1, 2), y: (-15, -3, 11)

Process finished with exit code 0

If you had multiple arguments for your worker you would build a "tasks"-list with argument-tuples and exchange pool.map(...) with pool.starmap(...iterable=tasks) . 如果您的worker有多个参数,您将构建一个带有参数元组的“任务”列表,并使用pool.starmap(...iterable=tasks)交换pool.map(...) pool.starmap(...iterable=tasks) See docs for further details on that. 有关详细信息,请参阅文档。


Process & Queue 流程和队列

If you can't use multiprocessing.Pool for some reason, you have to take care of inter-process communication (IPC) yourself, by passing a multiprocessing.Queue as argument to your worker-functions in the child- processes and letting them enqueue their results to be send back to the parent. 如果由于某种原因不能使用multiprocessing.Pool ,则必须自己处理进程间通信(IPC),方法是将multiprocessing.Queue作为参数传递给子进程中的worker-functions并让它们入队。他们的结果将被发送回父母。

You will also have to build your Pool-like structure so you can iterate over it to start and join the processes and you have to get() the results back from the queue. 您还必须构建类似Pool的结构,以便可以迭代它以启动并加入进程,并且必须从队列中get()结果。 More about Queue.get usage I've written up here . 有关Queue.get用法的更多信息,我已经在这里写了。

A solution with this approach could look like this: 这种方法的解决方案可能如下所示:

def worker(result_queue, batch_range):
    """worker function"""
    result = []
    for x in batch_range:
        y = ((x+5)**2+x-40)
        if y <= 0xf+1:
            print('Condition met at: ', y, x)
            result.append((x, y))
    result_queue.put(result)  # <--


if __name__ == '__main__':

    N_WORKERS = mp.cpu_count()
    X_MAX = 100000000

    result_queue = mp.Queue()  # <--
    batch_sizes = calc_batch_sizes(X_MAX, n_workers=N_WORKERS)
    batch_ranges = build_batch_ranges(batch_sizes)
    print(batch_ranges)

    with context_timer():

        pool = [Process(target=worker, args=(result_queue, batch_range))
                for batch_range in batch_ranges]

        for p in pool:
            p.start()

        results = [result_queue.get() for _ in batch_ranges]

        for p in pool:
            p.join()

    print(f'results: {results}')
    x, y = zip(*chain.from_iterable(results))  # filter and sort results
    print(f'results sorted: x: {x}, y: {y}')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM