Python：使用多處理模塊作為可能的解決方案來提高我的函數的速度

Question

我用Python 2.7（在Windows OS 64位上）編寫了一個函數，以便根據ESRI shapefile格式的參考多邊形（Ref）和一個或多個分段（Seg）多邊形計算相交區域的平均值。 該代碼非常慢，因為我有2000多個參考多邊形，並且對於每個Ref_polygon，該函數每次都會對所有Seg多邊形（超過7000個）運行。 抱歉，該函數是原型。

我想知道多處理是否可以幫助我提高循環速度，或者有更多的性能解決方案。 如果多處理可能是一種解決方案，我希望知道優化我的后續功能的最佳方法

import numpy as np
import ogr
import osr,gdal
from shapely.geometry import Polygon
from shapely.geometry import Point
import osgeo.gdal
import osgeo.gdal as gdal

def AreaInter(reference,segmented,outFile):
     # open shapefile
     ref = osgeo.ogr.Open(reference)
     if ref is None:
          raise SystemExit('Unable to open %s' % reference)
     seg = osgeo.ogr.Open(segmented)
     if seg is None:
          raise SystemExit('Unable to open %s' % segmented)
     ref_layer = ref.GetLayer()
     seg_layer = seg.GetLayer()
     # create outfile
     if not os.path.split(outFile)[0]:
          file_path, file_name_ext = os.path.split(os.path.abspath(reference))
          outFile_filename = os.path.splitext(os.path.basename(outFile))[0]
          file_out = open(os.path.abspath("{0}\\{1}.txt".format(file_path, outFile_filename)), "w")
     else:
          file_path_name, file_ext = os.path.splitext(outFile)
          file_out = open(os.path.abspath("{0}.txt".format(file_path_name)), "w")
     # For each reference objects-i
     for index in xrange(ref_layer.GetFeatureCount()):
          ref_feature = ref_layer.GetFeature(index)
          # get FID (=Feature ID)
          FID = str(ref_feature.GetFID())
          ref_geometry = ref_feature.GetGeometryRef()
          pts = ref_geometry.GetGeometryRef(0)
          points = []
          for p in xrange(pts.GetPointCount()):
               points.append((pts.GetX(p), pts.GetY(p)))
          # convert in a shapely polygon
          ref_polygon = Polygon(points)
          # get the area
          ref_Area = ref_polygon.area
          # create an empty list               
          Area_seg, Area_intersect = ([] for _ in range(2))
          # For each segmented objects-j
          for segment in xrange(seg_layer.GetFeatureCount()):
               seg_feature = seg_layer.GetFeature(segment)
               seg_geometry = seg_feature.GetGeometryRef()
               pts = seg_geometry.GetGeometryRef(0)
               points = []
               for p in xrange(pts.GetPointCount()):
                    points.append((pts.GetX(p), pts.GetY(p)))
               seg_polygon = Polygon(points)
               seg_Area.append = seg_polygon.area
               # intersection (overlap) of reference object with the segmented object
               intersect_polygon = ref_polygon.intersection(seg_polygon)
               # area of intersection (= 0, No intersection)
               intersect_Area.append = intersect_polygon.area
          # Avarage for all segmented objects (because 1 or more segmented polygons can  intersect with reference polygon)
          seg_Area_average = numpy.average(seg_Area)
          intersect_Area_average = numpy.average(intersect_Area)
          file_out.write(" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")
     file_out.close()

Answer 1

您可以使用多處理程序包，尤其是Pool類。 首先創建一個函數，該函數執行您要在for循環中完成的所有工作，並且僅將索引作為參數：

def process_reference_object(index):
      ref_feature = ref_layer.GetFeature(index)
      # all your code goes here
      return (" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")

請注意 ，這不會寫入文件本身，那樣會很麻煩，因為您將有多個進程同時寫入同一文件。 而是返回需要寫入的字符串。 還要注意，此函數中有些對象需要以某種方式到達它，例如ref_layer或ref_geometry由您決定如何執行（您可以將process_reference_object作為方法初始化在使用它們初始化的類中，或者可能很難看懂。只是在全局范圍內進行定義）。

然后，您創建一個進程資源池，並使用Pool.imap_unordered （它將根據需要將每個索引本身分配給不同的進程）運行所有索引：

from multiprocessing import Pool
p = Pool()  # run multiple processes
for l in p.imap_unordered(process_reference_object, range(ref_layer.GetFeatureCount())):
    file_out.write(l)

這將並行化多個過程中對參考對象的獨立處理，並將它們寫入文件（按任意順序，請注意）。

Answer 2

線程可以在一定程度上有所幫助，但是首先您應該確保不能簡化算法。 如果要對照7000個分段多邊形檢查2000個參考多邊形中的每一個（也許我誤解了），那么應該從那里開始。 運行在O（n ² ）的東西將會很慢，因此也許您可以修剪掉絕對不會相交的東西，或者找到其他加快速度的方法。 否則，運行多個進程或線程只會在數據幾何增長時線性改善。

Python：使用多處理模塊作為可能的解決方案來提高我的函數的速度

問題描述

2 個解決方案

解決方案1
6 已采納 2013-01-07 19:10:42

解決方案2
2 2013-01-07 19:12:05

Python：使用多處理模塊作為可能的解決方案來提高我的函數的速度

問題描述

2 個解決方案

解決方案1 6 已采納 2013-01-07 19:10:42

解決方案2 2 2013-01-07 19:12:05

解決方案1
6 已采納 2013-01-07 19:10:42

解決方案2
2 2013-01-07 19:12:05