for循环中的多处理

Question

Suppose I have a dictionary where each element is a quadrilateral defined by a tuple of GPS coordinates and also have tuple containing GPS coordinates of origin and destination points for a bunch of trips:(((origin_latitude, origin_longitude), (dest_latitude,dest_longitude)),((...),(...))). 假设我有一个字典，其中每个元素都是由GPS坐标元组定义的四边形，并且还具有包含一趟行程的起点和终点的GPS坐标的元组：（（（（origin_latitude，origin_longitude），（dest_latitude，dest_longitude）），（（（...），（...）））。 Here is an example for two quadrilaterals and two trips: 这是两个四边形和两次行程的示例：

dictionary={0:((0,0),(0,1),(1,1),(1,0)),1:((3,3),(3,4),(4,4),(4,3))}
trips=(((0.5,0.5),(3.5,3.5)),((-1,-1),(-2,-2)))

I want to classify each trip into an origin quadrilateral number, a destination quadrilateral number, and a combination number between origin and destination (trip reference number).Here is what I am doing: 我想将每次旅行分类为原点四边形数字，目的地四边形数字以及原点和目的地之间的组合数字（旅行参考编号）。这是我正在做的事情：

import matplotlib.path as mplPath

def is_in_zone(quadri,point):

    bbPath = mplPath.Path(quadri)
    return bbPath.contains_point(point)

def get_zone_nbr(dictio,trip):

    start_zone=-1
    end_zone=-1
    trip_ref=-1

    for key,coordinates in dictio.iteritems():

        if is_in_zone(coordinates,trip[0]):
            start_zone=key
        if is_in_zone(coordinates,trip[1]):
            end_zone=key
        if start_zone>-1 and end_zone>-1:
            trip_ref=len(dictio)*start_zone+end_zone
            break
    return (start_zone,end_zone,trip_ref)

if __name__=='__main__':

    dictionary={0:((0,0),(0,1),(1,1),(1,0)),1:((3,3),(3,4),(4,4),(4,3))}
    trips=(((0.5,0.5),(3.5,3.5)),((-1,-1),(-2,-2)))

    for t in trips:
        get_zone_nbr(dictionary,t)

My dictionary will approximately be of length 30, so the function get_zone_nbr will be quite slow. 我的字典的长度大约为30，因此函数get_zone_nbr会很慢。 I have millions of trips to process. 我有数百万次旅行要处理。 Do you see any obvious way to optimize get_zone_nbr()? 您是否看到任何明显的方法来优化get_zone_nbr（）？ or anything that would make this code run faster (eg multiprocessing but I am not sure how to use it with loops). 或任何可以使此代码运行更快的方法（例如，多处理，但我不确定如何将其与循环一起使用）。

Answer 1

A simple first parallelism is to process your trips in parallel. 一个简单的第一并行是并行处理您的行程。

>>> import matplotlib.path as mplPath
>>> def is_in_zone(quadri,point):
...     bbPath = mplPath.Path(quadri)
...     return bbPath.contains_point(point)
... 
>>> def get_zone_nbr(dictio,trip):
...     start_zone=-1
...     end_zone=-1
...     trip_ref=-1
...     for key,coordinates in dictio.iteritems():
...         if is_in_zone(coordinates,trip[0]):
...             start_zone=key
...         if is_in_zone(coordinates,trip[1]):
...             end_zone=key
...         if start_zone>-1 and end_zone>-1:
...             trip_ref=len(dictio)*start_zone+end_zone
...             break
...     return (start_zone,end_zone,trip_ref)
... 
>>> dictionary={0:((0,0),(0,1),(1,1),(1,0)),1:((3,3),(3,4),(4,4),(4,3))}
>>> trips=(((0.5,0.5),(3.5,3.5)),((-1,-1),(-2,-2)))
>>> 
>>> from pathos.pools import ThreadPool 
>>> pool = ThreadPool()
>>> 
>>> results = pool.map(lambda x: get_zone_nbr(dictionary, x), trips)
>>> results
[(0, 1, 1), (-1, -1, -1)]

I'm using pathos which is a multiprocessing fork that provides better serialization, flexibility, and interactivity. 我正在使用pathos ，它是一个multiprocessing分支，可提供更好的序列化，灵活性和交互性。 (I'm also the author.) （我也是作者。）

You can also then apply the same approach to transform the for loop inside your function get_zone_nbr into a map function call. 然后，您还可以应用相同的方法将函数get_zone_nbr的for循环转换为map函数调用。 pathos allows you to use a map call with multiple arguments. pathos允许您使用带有多个参数的map调用。 Since you are working over a dictionary items, and the items will naturally be unordered, you can use the "unordered iterated map" (in pathos that's uimap , but in multiprocessing it's imap_unordered ). 既然你正在通过字典项目，项目自然是无序的，你可以使用“无序迭代地图”（在pathos那uimap ，但在multiprocessing它imap_unordered ）。

I also suggest you time your code, to see which of the map calls will be faster. 我还建议您花一些时间编码，以查看哪个map调用会更快。 There are several different variants of map call, and several different parallel backends. map调用有几种不同的变体，以及几种不同的并行后端。 I used a thread pool above, but there's also going parallel across processes and sockets (the latter will be too slow for your case). 我在上面使用了一个线程池，但是在进程和套接字之间也是并行的（对于您的情况，后者会太慢）。 pathos provides a uniform API for all of the choices, so you can just code it once, and then drop in any of the other pools/maps until you find one that's the fastest for your case. pathos为所有选择提供统一的API，因此您只需编写一次代码，然后放入其他任何池/映射，直到找到最适合您的情况的池/映射。

Get pathos here: https://github.com/uqfoundation 在这里获取pathos ： https : //github.com/uqfoundation

for循环中的多处理

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-08-15 15:10:59

for循环中的多处理

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-08-15 15:10:59

解决方案1
0 已采纳 2015-08-15 15:10:59