简体   繁体   English

Python 性能问题:两个多边形之间的差异

[英]Python performance problem: difference between two polygons

I am currently using Python 3.7 and I want to find the difference between a lot of polygons.我目前正在使用 Python 3.7,我想找到很多多边形之间的区别。 With that I mean that if I have a polygon A and a polygon B I want to do the mathematical "A not B" operation.我的意思是,如果我有一个多边形A和一个多边形B,我想做数学“A not B”运算。 There are two possible outcomes of this operation as seen in the following illustration:此操作有两种可能的结果,如下图所示:

在此处输入图片说明

So two polygons that I subtract ("cut") from each other either give me a new polygon or are empty.所以我相互减去(“切割”)的两个多边形要么给我一个新的多边形,要么是空的。 All other cases can be ignored.所有其他情况都可以忽略。 The form of the polygon does not need to be exact for case 1. So it is acceptable if the polygon changes a bit.对于情况 1,多边形的形式不需要精确。因此,如果多边形稍有变化也是可以接受的。

For case 2 I need to know if the polygon is empty.对于案例 2,我需要知道多边形是否为空。

Furthermore polygon A and B do not have any "holes" in them so they can be described by only their outside border.此外,多边形 A 和 B 中没有任何“洞”,因此只能通过它们的外边界来描述它们。

I already built a prototype that uses the difference operation of shapely to do this.我已经构建了一个原型,它使用shapely 的difference操作来做到这一点。 I "cut" exactly as little as possible (once for every two polygons).我尽可能少地“切割”(每两个多边形一次)。

My code is a bit complex but it basically breaks down to this simple function:我的代码有点复杂,但基本上可以分解为这个简单的函数:

def cut_hole(A : Polygon, B : Polygon) -> Polygon:
    """
    Cuts a "hole" into shapely polygon A
    :return: The polygon resulting of the operation A-B. Might be empty!
    """
    outer = A #not in my code, just to point out what I mean
    inner = B
    return outer.difference(inner)

Now my problem is that this is very slow!现在我的问题是这很慢! I work with roughly 15.000 operations per batch (30.000 polygons) and I takes about 10 to 15 min to calculate them all.我每批次处理大约15.000 个操作(30.000 个多边形),我需要大约 10 到 15 分钟来计算它们。 I would really like to go down to under 5 mins.我真的很想缩短到 5 分钟以内。

Please keep in mind that this does not account for all the other operations.请记住,这并没有考虑到所有其他操作。 15 min just for the difference operation. 15 分钟仅用于差异操作。 I can sort every polygon A to every polygon B in under 1 min.我可以在 1 分钟内将每个多边形 A 排序到每个多边形 B。 I just need a quick way to get the resulting polygon from those.我只需要一种快速的方法来从中获取生成的多边形。

I did this test with an "good" computer (Intel core i7, 16 GB Ram).我用一台“好”的电脑(英特尔酷睿 i7,16 GB 内存)做了这个测试。 Neither the CPU or RAM was at its limit. CPU 或 RAM 都未达到极限。

So the big question is: how can I speed this up?所以最大的问题是:我怎样才能加快速度?

Is there a way to translate the polygons into a form that is easier to handle?有没有办法将多边形转换成更容易处理的形式? Or is there a "better" way to get the difference of two polygons?或者有没有“更好”的方法来获得两个多边形的差异?

Is there an alternative library that might be better?有没有更好的替代图书馆? Or can I get shapely to use other hardware?或者我可以使用其他硬件吗? If so what kind of hardware might that be?如果是这样,那可能是哪种硬件?

Finally my next step would be to try and parallelize the "cutting".最后,我的下一步是尝试并行化“切割”。 Is there an build-in way to do this quickly and efficiently?有没有内置的方法可以快速有效地做到这一点? Because I did not find one in shapely.因为我没有找到一个身材匀称的。

Also I would be very grateful for tips on analyzing possible bottlenecks.此外,我将非常感谢有关分析可能瓶颈的提示。

Addendum:附录:

Some of the polygons seem to be rather complex.一些多边形似乎相当复杂。 With that I mean that at average the more complex polygons contain 15.000 points.我的意思是,更复杂的多边形平均包含 15.000 个点。 The not complex polygons less then 100 points.不复杂的多边形少于 100 点。 However usually (as in 99 %) polygon type A or type B are not complex at the same time.然而,通常(如 99%)多边形类型 A 或类型 B 并不复杂。

Here is an example of an complex polygon in WKT这是 WKT 中复杂多边形的示例

Taking your points in order:按顺序排列你的分数:

  • I highly doubt there is another, better-suited format/library for manipulating polygons in python than shapely, it is the reference package.我非常怀疑还有另一种更适合在 python 中操作多边形的格式/库而不是 shapely,它是参考包。 You can try to simplify your geometries, but some rapid tests showed it is a slow operation as well ( p being the polygon you copypasted above):您可以尝试simplify几何图形,但一些快速测试表明它也是一个缓慢的操作( p是您在上面复制粘贴的多边形):

     p2 = p.buffer(-10) # creating a 2nd polygon %timeit p.simplify(1) # 58.4 ms, from 15000 to 8000 points %timeit p.difference(p2) # 53.2 ms %timeit p.difference(p2.simplify(1)) # 127ms %timeit p.simplify(1).difference(p2) # 114ms
  • Shapely uses GEOS under the hood. Shapely 在幕后使用GEOS Maybe you can try to dig in that direction for lower-level solutions.也许您可以尝试朝那个方向挖掘较低级别的解决方案。

  • There is no parallelism in shapely. shapely 没有平行性。 However as you seem to have your 'As' and 'Bs' polygons already matched, you can parallelize the shapely operation through a threadpool or processpool (see multiprocessing package).但是,由于您的“As”和“Bs”多边形似乎已经匹配,因此您可以通过线程池或进程池(请参阅多处理包)并行化匀称操作。 If they are not matched, you can check it quickly through intersects (much faster than intersection or difference . If some of your polygons do not intersect, that will be a huge speedup.如果它们不匹配,你可以通过intersects快速检查它(比intersectiondifference快得多。如果你的一些多边形不相交,那将是一个巨大的加速。

  • Considering the size of your data (5GB is a lot of geometries...), I don't think you can spare that much time other than with parallelization, as one difference takes ~70ms which gives ~1050s = 17 min for 15000 operations考虑到您的数据大小(5GB 是很多几何图形......),我认为除了并行化之外,您无法节省那么多时间,因为一个difference需要约 70 毫秒,这使得 15000 次操作需要约 1050 秒 = 17 分钟

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python-两种实现之间的性能差异 - python - performance difference between the two implementations python中两个值之间的差异问题 - Problem with difference between two value in python 为什么Python和Cython中这两个代码之间存在巨大的性能差异? - Why there is a huge performance difference between these two codes in Python and Cython? 两个 if 语句之间的性能差异? - Performance difference between two if statements? python区域随着多边形性能的增长而增长 - python region growing with polygons performance 测量 Python 和 Java 实现之间的性能差异? - Measuring performance difference between Python and Java implementations? Python 中 for 和 while 循环之间的性能差异 - Performance difference between for and while loop in Python 基准测试JS和Python之间的性能差异 - Benchmarking performance difference between JS and Python Python(2.7):为什么以下两个代码片段之间存在性能差异,这两个代码片段实现了两个字典的交集 - Python (2.7): Why is there a performance difference between the following 2 code snippets that implement the intersection of two dictionaries Python:`dist`和`sdist`之间有性能差异吗? - Python: Is there a performance difference between `dist` and `sdist`?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM