简体   繁体   English

使用python进行多重处理以找到最大值

[英]Multiprocessing with python to find Max value

I'm working with Python 2.7.5 and OpenCV. 我正在使用Python 2.7.5和OpenCV。 I have a test image and I want to find it's most similar image in an array of images. 我有一个测试图像,我想在一组图像中找到最相似的图像。 I have written a function using OpenCV that will give me the total number of similarity points. 我已经使用OpenCV编写了一个函数,该函数将为我提供相似点的总数。 The more similar points I have the more similar the images are. 我拥有的相似点越多,图像就越相似。 Unfortunately this is a rather time consuming function so I would like to parallelize my code to make it faster. 不幸的是,这是一个相当耗时的功能,因此我想并行化我的代码以使其更快。

#img is the image that I am trying to find the most number of similar pointswith
maxSimilarPts = 0;

#testImages is a list of testImages
for testImage in testImages:
    #getNumSimilarPts returns the number of similar points between two images
    similarPts = getNumSimilarPts(img, testImage) 

    if similarPts > maxSimilarPts:
        maxSimilarPts = similarPts

How can I do this in parallel with python? 如何与python并行执行此操作? Any help would be greatly appreciated. 任何帮助将不胜感激。

The following is a (untested) parallel version of the original code. 以下是原始代码的(未经测试的)并行版本。 It runs 5 workers in parallel. 它并行运行5个工人。 Each one takes an image from the input queue, calculates the similary, then puts the value and image onto an output queue. 每个人都从输入队列中获取一张图像,计算相似度,然后将值和图像放入输出队列中。 When all the workers are done, there are no more images, then the parent process prints the (similarity, imageID) of the most similar image. 当所有工作人员都完成后,不再有图像,然后父进程将打印最相似图像的(相似度,imageID)。

# adapted from Raymond Hettinger
# http://stackoverflow.com/questions/11920490/how-do-i-run-os-walk-in-parallel-in-python/23779787#23779787

from multiprocessing.pool import Pool
from multiprocessing import JoinableQueue as Queue
import os, sys


def parallel_worker():
    while True:
        testImage = imageq.get()
        similarPts = getNumSimilarPts(img, testImage) 
        similarq.put( [similarPts, testImage] )
        imageq.task_done()

similarq = Queue()
imageq = Queue()
for testImage in testImages:
    imageq.put(testImage)

pool = Pool(5)
for i in range(5):
    pool.apply_async(parallel_worker)

imageq.join()
print 'Done'

print max(similarq)

Important note: 重要的提示:

This code will work natively only on python3. 这段代码只能在python3上本地运行。 to run it on python2 you must install the concurrent.futures PyPI package . 要在python2上运行它,必须安装current.futures PyPI软件包

from concurrent.futures import ProcessPoolExecutor


def multiprocess_max(iterable, key):
    with ProcessPoolExecutor() as executor:
        return max(executor.map(lambda item: (item, key(item)), iterable),
                   key=lambda item: item[1])[0]

The idea behind is the following: 背后的想法如下:

The expensive process is calculating the key for comparing the item. 昂贵的过程是计算用于比较商品的钥匙。 So, what not to calculate the key by multi processes but comparing it using only one process? 那么,不是要通过多个进程来计算密钥 ,而是仅使用一个进程来比较密钥呢?

Here's how it works: 运作方式如下:

Create a concurrent.futures.ProcessPoolExecutor , which is a easy-to-use wrapper around the multiprocessing module, and provide a map() function like the builtin but that works concurrently. 创建concurrent.futures.ProcessPoolExecutor ,它是围绕multiprocessing模块的易于使用的包装,并提供类似于内置函数的map()函数,但该函数可以同时工作。

Then, from the collections, create for each item tuple with 2 elements: the original item (what we want to return, if it's key is the max) and the key, computed with the passed key function. 然后,从集合中为每个项目元组创建2个元素:原始项目(如果键为max,我们要返回的值)和使用传递的key函数计算的key

After we got a result, pass it to the builtin max() - but we have a problem: the collections now is a collection of tuples! 得到结果后,将其传递给内置的max() -但我们遇到了一个问题:现在的集合是元组的集合! So, we pass a key function that returns the second item - the computed key. 因此,我们传递了一个key函数,该函数返回第二项-计算键。

Finally, since max() returns the whole item (which includes the key that is unwanted), we extract the first item - the original item - from its result and return it. 最后,由于max()返回整个项目(包括不需要的键),因此我们从结果中提取第一个项目(原始项目)并返回它。

Edit: 编辑:

After this code locked in my console (the IDLE; I find this question because I needed it too), I thought my solution is wrong :-) 将此代码锁定在我的控制台中之后(IDLE;我也找到了这个问题,因为我也需要它),我认为我的解决方案是错误的:-)

But I wrong, not the solution. 但是我错了,不是解决方案。 This solution won't work in the interpreter . 该解决方案在解释器中不起作用 From the docs : 文档

The __main__ module must be importable by worker subprocesses. __main__模块必须可由工作程序子进程导入。 This means that ProcessPoolExecutor will not work in the interactive interpreter. 这意味着ProcessPoolExecutor将无法在交互式解释器中工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM