简体   繁体   English

在python中查找地理数据中圆圈内的所有坐标

[英]Find all coordinates within a circle in geographic data in python

I've got millions of geographic points. 我有数百万个地理点。 For each one of these, I want to find all "neighboring points," ie, all other points within some radius, say a few hundred meters. 对于其中的每一个,我想找到所有“相邻点”,即在某个半径内的所有其他点,比如说几百米。

There is a naive O(N^2) solution to this problem---simply calculate the distance of all pairs of points. 这个问题有一个天真的O(N ^ 2)解决方案---只需计算所有点对的距离。 However, because I'm dealing with a proper distance metric (geographic distance), there should be a quicker way to do this. 但是,因为我正在处理适当的距离度量(地理距离),所以应该有更快的方法来做到这一点。

I would like to do this within python. 我想在python中这样做。 One solution that comes to mind is to use some database (mySQL with GIS extentions, PostGIS) and hope that such a database would take care of efficiently performing the operation described above using some index. 想到的一个解决方案是使用一些数据库(带有GIS扩展的mySQL,PostGIS),并希望这样的数据库能够使用一些索引有效地执行上述操作。 I would prefer something simpler though, that doesn't require me to build and learn about such technologies. 我更喜欢更简单的东西,这不需要我建立和学习这些技术。

A couple of points 几点

  • I will perform the "find neighbors" operation millions of times 我将执行数百万次的“寻找邻居”操作
  • The data will remain static 数据将保持不变
  • Because the problem is in a sense simple, I'd like to see they python code that solves it. 因为问题在某种意义上很简单,我希望看到它们解决它的python代码。

Put in terms of python code, I want something along the lines of: 就python代码而言,我想要的是:

points = [(lat1, long1), (lat2, long2) ... ] # this list contains millions lat/long tuples
points_index = magical_indexer(points)
neighbors = []
for point in points:
    point_neighbors = points_index.get_points_within(point, 200) # get all points within 200 meters of point
    neighbors.append(point_neighbors) 

Tipped off by Eamon, I've come up with a simple solution using btrees implemented in SciPy. 在Eamon的帮助下,我提出了一个使用SciPy中实现的btree的简单解决方案。

from scipy.spatial import cKDTree
from scipy import inf

max_distance = 0.0001 # Assuming lats and longs are in decimal degrees, this corresponds to 11.1 meters
points = [(lat1, long1), (lat2, long2) ... ]
tree = cKDTree(points)

point_neighbors_list = [] # Put the neighbors of each point here

for point in points:
    distances, indices = tree.query(point, len(points), p=2, distance_upper_bound=max_distance)
    point_neighbors = []
    for index, distance in zip(indices, distances):
        if distance == inf:
            break
        point_neighbors.append(points[index])
    point_neighbors_list.append(point_neighbors)

scipy SciPy的

First things first: there are preexisting algorithms to do things kind of thing, such as the kd tree . 首先要做的事情是:有预先存在的算法可以做某事,比如kd树 Scipy has a python implementation cKDtree that can find all points in a given range. Scipy有一个python实现cKDtree ,它可以找到给定范围内的所有点。

Binary Search 二进制搜索

Depending on what you're doing however, implementing something like that may be nontrivial. 然而,根据你正在做的事情,实现这样的事情可能是非常重要的。 Furthermore, creating a tree is fairly complex (potentially quite a bit of overhead), and you may be able to get away with a simple hack I've used before: 此外,创建一个树是相当复杂的(可能相当多的开销),你可能能够摆脱我以前使用过的简单hack:

  1. Compute the PCA of the dataset. 计算数据集的PCA。 You want to rotate the dataset such that the most significant direction is first, and the orthogonal (less large) second direction is, well, second. 您希望旋转数据集,使得最重要的方向是第一个,而正交(不太大)的第二个方向是第二个。 You can skip this and just choose X or Y, but it's computationally cheap and usually easy to implement. 您可以跳过此选项并选择X或Y,但它的计算成本低且通常易于实现。 If you do just choose X or Y, choose the direction with greater variance. 如果只选择X或Y,请选择方差较大的方向。
  2. Sort the points by the major direction (call this direction X). 按主方向对点进行排序(将此方向称为X)。
  3. To find the nearest neighbor of a given point, find the index of the point nearest in X by binary search (if the point is already in your collection, you may already know this index and don't need the search). 要查找给定点的最近邻居,请通过二分查找找到最接近X的点的索引(如果该点已经在您的集合中,您可能已经知道该索引并且不需要搜索)。 Iteratively look to the next and previous points, maintaining the best match so far and its distance from your search point. 迭代地查看下一个和前一个点,保持到目前为止的最佳匹配以及它与搜索点的距离。 You can stop looking when the difference in X is greater than or equal to the distance to the best match so far (in practice, usually very few points). 你可以停止查看X的差异是否大于或等于到目前为止最佳匹配的距离(实际上,通常只有很少的点)。
  4. To find all points within a given range, do the same as step 3, except don't stop until the difference in X exceeds the range. 要查找给定范围内的所有点,请执行与步骤3相同的操作,但在X中的差异超出范围之前不要停止。

Effectively, you're doing O(N log(N)) preprocessing, and for each point roughly o(sqrt(N)) - or more , if the distribution of your points is poor. 实际上,您正在进行O(N log(N))预处理,并且对于每个点大致为o(sqrt(N)) - 或更多 ,如果您的点的分布很差。 If the points are roughly uniformly distributed, the number of points nearer in X than the nearest neighbor will be on the order of the square root of N. It's less efficient if many points are within your range, but never much worse than brute force. 如果点大致均匀分布,则X中比最近邻点更近的点数将是N的平方根的数量级。如果许多点在您的范围内,则效率较低,但绝不比蛮力更差。

One advantage of this method is that's it all executable in very few memory allocations, and can mostly be done with very good memory locality, which means that it performs quite well despite the obvious limitations. 这种方法的一个优点是它可以在很少的内存分配中执行,并且大部分可以用非常好的内存局部性来完成,这意味着尽管存在明显的局限性,它仍然可以很好地执行。

Delauney triangulation Delauney三角剖分

Another idea: a Delauney triangulation could work. 另一个想法: Delauney三角测量可以工作。 For the Delauney triangulation, it's given that any point's nearest neighbor is an adjacent node. 对于Delauney三角剖分,给出任何点的最近邻居都是相邻节点。 The intuition is that during a search, you can maintain a heap (priority queue) based on absolute distance from query point. 直觉是在搜索过程中,您可以根据与查询点的绝对距离来维护堆(优先级队列)。 Pick the nearest point, check that it's in range, and if so add all its neighbors. 选择最近的点,检查它是否在范围内,如果是,则添加其所有邻居。 I suspect that it's impossible to miss any points like this, but you'd need to look at it more carefully to be sure... 怀疑不可能错过这样的任何一点,但你需要更仔细地看一下才能确定......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM