简体   繁体   English

一旦任何进程在python中找到匹配项,如何使所有pool.apply_async进程停止

[英]How to get all pool.apply_async processes to stop once any one process has found a match in python

I have the following code that is leveraging multiprocessing to iterate through a large list and find a match. 我有以下代码利用多处理程序来遍历大列表并找到匹配项。 How can I get all processes to stop once a match is found in any one processes? 在任何一个进程中找到匹配项后,如何使所有进程停止? I have seen examples but I none of them seem to fit into what I am doing here. 我已经看到了示例,但是我似乎都不适合我在这里所做的事情。

#!/usr/bin/env python3.5
import sys, itertools, multiprocessing, functools

alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!@#$%^&*?,()-=+[]/;"
num_parts = 4
part_size = len(alphabet) // num_parts

def do_job(first_bits):
    for x in itertools.product(first_bits, *itertools.repeat(alphabet, num_parts-1)):
        # CHECK FOR MATCH HERE
        print(''.join(x))
        # EXIT ALL PROCESSES IF MATCH FOUND

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    results = []

    for i in range(num_parts):
        if i == num_parts - 1:
            first_bit = alphabet[part_size * i :]
        else:
            first_bit = alphabet[part_size * i : part_size * (i+1)]
        pool.apply_async(do_job, (first_bit,))

    pool.close()
    pool.join()

Thanks for your time. 谢谢你的时间。

UPDATE 1: 更新1:

I have implemented the changes suggested in the great approach by @ShadowRanger and it is nearly working the way I want it to. 我已经实现了@ShadowRanger的出色方法中建议的更改,并且几乎可以按照我想要的方式工作。 So I have added some logging to give an indication of progress and put a 'test' key in there to match. 因此,我添加了一些日志记录以指示进度,并在其中放置一个“测试”键以进行匹配。 I want to be able to increase/decrease the iNumberOfProcessors independently of the num_parts. 我希望能够独立于num_parts来增加/减少iNumberOfProcessors。 At this stage when I have them both at 4 everything works as expected, 4 processes spin up (one extra for the console). 在这个阶段,当我让它们都达到4时,一切都按预期工作,则4个进程开始旋转(控制台额外增加了一个)。 When I change the iNumberOfProcessors = 6, 6 processes spin up but only for of them have any CPU usage. 当我将iNumberOfProcessors更改为6时,有6个进程旋转,但只有其中6个进程具有任何CPU使用率。 So it appears 2 are idle. 因此看来2是空闲的。 Where as my previous solution above, I was able to set the number of cores higher without increasing the num_parts, and all of the processes would get used. 在上面的解决方案中,我能够在不增加num_parts的情况下设置更高的内核数,并且所有进程都将被使用。

在此处输入图片说明

I am not sure about how to refactor this new approach to give me the same functionality. 我不确定如何重构这种新方法来为我提供相同的功能。 Can you have a look and give me some direction with the refactoring needed to be able to set iNumberOfProcessors and num_parts independently from each other and still have all processes used? 您能否看一下并为我提供一些重构的方向,以便能够彼此独立设置iNumberOfProcessors和num_parts并仍然使用所有进程?

Here is the updated code: 这是更新的代码:

#!/usr/bin/env python3.5
import sys, itertools, multiprocessing, functools

alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!@#$%^&*?,()-=+[]/;"
num_parts = 4
part_size = len(alphabet) // num_parts
iProgressInterval = 10000
iNumberOfProcessors = 6

def do_job(first_bits):
    iAttemptNumber = 0
    iLastProgressUpdate = 0
    for x in itertools.product(first_bits, *itertools.repeat(alphabet, num_parts-1)):
        sKey = ''.join(x)
        iAttemptNumber = iAttemptNumber + 1
        if iLastProgressUpdate + iProgressInterval <= iAttemptNumber:
            iLastProgressUpdate = iLastProgressUpdate + iProgressInterval
            print("Attempt#:", iAttemptNumber, "Key:", sKey)
        if sKey == 'test':
            print("KEY FOUND!! Attempt#:", iAttemptNumber, "Key:", sKey)
            return True

def get_part(i):
    if i == num_parts - 1:
        first_bit = alphabet[part_size * i :]
    else:
        first_bit = alphabet[part_size * i : part_size * (i+1)]
    return first_bit

if __name__ == '__main__':
    # with statement with Py3 multiprocessing.Pool terminates when block exits
    with multiprocessing.Pool(processes = iNumberOfProcessors) as pool:

        # Don't need special case for final block; slices can 
        for gotmatch in pool.imap_unordered(do_job, map(get_part, range(num_parts))):
             if gotmatch:
                 break
        else:
             print("No matches found")

UPDATE 2: 更新2:

Ok here is my attempt at trying @noxdafox suggestion. 好的,这是我尝试@noxdafox建议的尝试。 I have put together the following based on the link he provided with his suggestion. 根据他提供的建议,我整理了以下内容。 Unfortunately when I run it I get the error: 不幸的是,当我运行它时,我得到了错误:

... line 322, in apply_async raise ValueError("Pool not running") ValueError: Pool not running ...第322行,在apply_async中引发ValueError(“ Pool not running”)。

Can anyone give me some direction on how to get this working. 谁能给我一些有关如何使它工作的指导。

Basically the issue is that my first attempt did multiprocessing but did not support canceling all processes once a match was found. 基本上,问题是我的第一次尝试进行了多处理,但不支持在找到匹配项后取消所有进程。

My second attempt (based on @ShadowRanger suggestion) solved that problem, but broke the functionality of being able to scale the number of processes and num_parts size independently, which is something my first attempt could do. 我的第二次尝试(基于@ShadowRanger的建议)解决了该问题,但是破坏了能够独立扩展进程数和num_parts大小的功能,这是我的第一次尝试。

My third attempt (based on @noxdafox suggestion), throws the error outlined above. 我的第三次尝试(基于@noxdafox的建议)抛出上面概述的错误。

If anyone can give me some direction on how to maintain the functionality of my first attempt (being able to scale the number of processes and num_parts size independently), and add the functionality of canceling all processes once a match was found it would be much appreciated. 如果有人可以给我一些指导,说明如何维护我的第一次尝试的功能(能够独立扩展进程的数量和num_parts的大小),并添加一旦发现匹配项便取消所有进程的功能,将不胜感激。

Thank you for your time. 感谢您的时间。

Here is the code from my third attempt based on @noxdafox suggestion: 这是基于@noxdafox建议的第三次尝试的代码:

#!/usr/bin/env python3.5
import sys, itertools, multiprocessing, functools

alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!@#$%^&*?,()-=+[]/;"
num_parts = 4
part_size = len(alphabet) // num_parts
iProgressInterval = 10000
iNumberOfProcessors = 4


def find_match(first_bits):
    iAttemptNumber = 0
    iLastProgressUpdate = 0
    for x in itertools.product(first_bits, *itertools.repeat(alphabet, num_parts-1)):
        sKey = ''.join(x)
        iAttemptNumber = iAttemptNumber + 1
        if iLastProgressUpdate + iProgressInterval <= iAttemptNumber:
            iLastProgressUpdate = iLastProgressUpdate + iProgressInterval
            print("Attempt#:", iAttemptNumber, "Key:", sKey)
        if sKey == 'test':
            print("KEY FOUND!! Attempt#:", iAttemptNumber, "Key:", sKey)
            return True

def get_part(i):
    if i == num_parts - 1:
        first_bit = alphabet[part_size * i :]
    else:
        first_bit = alphabet[part_size * i : part_size * (i+1)]
    return first_bit

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)

class Worker():

    def __init__(self, workers):
        self.workers = workers

    def callback(self, result):
        if result:
            self.pool.terminate()

    def do_job(self):
        print(self.workers)
        pool = multiprocessing.Pool(processes=self.workers)
        for part in grouper(alphabet, part_size):
            pool.apply_async(do_job, (part,), callback=self.callback)
        pool.close()
        pool.join()
        print("All Jobs Queued")

if __name__ == '__main__':
    w = Worker(4)
    w.do_job()

You can check this question to see an implementation example solving your problem. 您可以检查该问题,以查看解决您的问题的实现示例。

This works also with concurrent.futures pool. 这也适用于current.futures池。

Just replace the map method with apply_async and iterated over your list from the caller. 只需将map方法替换为apply_async并在调用者的列表上进行迭代即可。

Something like this. 这样的事情。

for part in grouper(alphabet, part_size):
    pool.apply_async(do_job, part, callback=self.callback)

grouper recipe 石斑鱼食谱

multiprocessing isn't really designed to cancel tasks, but you can simulate it for your particular case by using pool.imap_unordered and terminating the pool when you get a hit: multiprocessing并不是为取消任务而设计的,但是您可以通过使用pool.imap_unordered并在遇到问题时终止池来针对特定情况进行模拟:

def do_job(first_bits):
    for x in itertools.product(first_bits, *itertools.repeat(alphabet, num_parts-1)):
        # CHECK FOR MATCH HERE
        print(''.join(x))
        if match:
            return True
    # If we exit loop without a match, function implicitly returns falsy None for us
# Factor out part getting to simplify imap_unordered use
def get_part(i):
    if i == num_parts - 1:
        first_bit = alphabet[part_size * i :]
    else:
        first_bit = alphabet[part_size * i : part_size * (i+1)]

if __name__ == '__main__':
    # with statement with Py3 multiprocessing.Pool terminates when block exits
    with multiprocessing.Pool(processes=4) as pool:

        # Don't need special case for final block; slices can 
        for gotmatch in pool.imap_unordered(do_job, map(get_part, range(num_parts))):
             if gotmatch:
                 break
        else:
             print("No matches found")

This will run do_job for each part, returning results as fast as it can get them. 这将对每个部分运行do_job ,并尽可能快地返回结果。 When a worker returns True , the loop breaks, and the with statement for the Pool is exited, terminate -ing the Pool (dropping all work in progress). 当工人返回True ,循环中断,并with该语句Pool退出, terminate -ing的Pool (丢弃所有正在进行的工作)。

Note that while this works, it's kind of abusing multiprocessing ; 请注意,尽管这样做有效,但它却在滥用multiprocessing it won't handle canceling individual tasks without terminating the whole Pool . 如果不终止整个Pool它不会处理取消个别任务。 If you need more fine grained task cancellation, you'll want to look at concurrent.futures , but even there, it can only cancel undispatched tasks; 如果您需要更细粒度的任务取消,你会想看看concurrent.futures ,但即使有,也只能取消未分派任务; once they're running, they can't be cancelled without terminating the Executor or using a side-band means of termination (having the task poll some interprocess object intermittently to determine if it should continue running). 它们一旦运行,就必须在不终止Executor或使用边带终止方法的情况下取消它们(让任务间歇地轮询某些进程间对象以确定它是否应继续运行)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM