[英]Find all coordinates within a circle in geographic data in python
I've got millions of geographic points. 我有数百万个地理点。 For each one of these, I want to find all "neighboring points," ie, all other points within some radius, say a few hundred meters. 对于其中的每一个,我想找到所有“相邻点”,即在某个半径内的所有其他点,比如说几百米。
There is a naive O(N^2) solution to this problem---simply calculate the distance of all pairs of points. 这个问题有一个天真的O(N ^ 2)解决方案---只需计算所有点对的距离。 However, because I'm dealing with a proper distance metric (geographic distance), there should be a quicker way to do this. 但是,因为我正在处理适当的距离度量(地理距离),所以应该有更快的方法来做到这一点。
I would like to do this within python. 我想在python中这样做。 One solution that comes to mind is to use some database (mySQL with GIS extentions, PostGIS) and hope that such a database would take care of efficiently performing the operation described above using some index. 想到的一个解决方案是使用一些数据库(带有GIS扩展的mySQL,PostGIS),并希望这样的数据库能够使用一些索引有效地执行上述操作。 I would prefer something simpler though, that doesn't require me to build and learn about such technologies. 我更喜欢更简单的东西,这不需要我建立和学习这些技术。
A couple of points 几点
Put in terms of python code, I want something along the lines of: 就python代码而言,我想要的是:
points = [(lat1, long1), (lat2, long2) ... ] # this list contains millions lat/long tuples
points_index = magical_indexer(points)
neighbors = []
for point in points:
point_neighbors = points_index.get_points_within(point, 200) # get all points within 200 meters of point
neighbors.append(point_neighbors)
Tipped off by Eamon, I've come up with a simple solution using btrees implemented in SciPy. 在Eamon的帮助下,我提出了一个使用SciPy中实现的btree的简单解决方案。
from scipy.spatial import cKDTree
from scipy import inf
max_distance = 0.0001 # Assuming lats and longs are in decimal degrees, this corresponds to 11.1 meters
points = [(lat1, long1), (lat2, long2) ... ]
tree = cKDTree(points)
point_neighbors_list = [] # Put the neighbors of each point here
for point in points:
distances, indices = tree.query(point, len(points), p=2, distance_upper_bound=max_distance)
point_neighbors = []
for index, distance in zip(indices, distances):
if distance == inf:
break
point_neighbors.append(points[index])
point_neighbors_list.append(point_neighbors)
First things first: there are preexisting algorithms to do things kind of thing, such as the kd tree . 首先要做的事情是:有预先存在的算法可以做某事,比如kd树 。 Scipy has a python implementation cKDtree that can find all points in a given range. Scipy有一个python实现cKDtree ,它可以找到给定范围内的所有点。
Depending on what you're doing however, implementing something like that may be nontrivial. 然而,根据你正在做的事情,实现这样的事情可能是非常重要的。 Furthermore, creating a tree is fairly complex (potentially quite a bit of overhead), and you may be able to get away with a simple hack I've used before: 此外,创建一个树是相当复杂的(可能相当多的开销),你可能能够摆脱我以前使用过的简单hack:
Effectively, you're doing O(N log(N)) preprocessing, and for each point roughly o(sqrt(N)) - or more , if the distribution of your points is poor. 实际上,您正在进行O(N log(N))预处理,并且对于每个点大致为o(sqrt(N)) - 或更多 ,如果您的点的分布很差。 If the points are roughly uniformly distributed, the number of points nearer in X than the nearest neighbor will be on the order of the square root of N. It's less efficient if many points are within your range, but never much worse than brute force. 如果点大致均匀分布,则X中比最近邻点更近的点数将是N的平方根的数量级。如果许多点在您的范围内,则效率较低,但绝不比蛮力更差。
One advantage of this method is that's it all executable in very few memory allocations, and can mostly be done with very good memory locality, which means that it performs quite well despite the obvious limitations. 这种方法的一个优点是它可以在很少的内存分配中执行,并且大部分可以用非常好的内存局部性来完成,这意味着尽管存在明显的局限性,它仍然可以很好地执行。
Another idea: a Delauney triangulation could work. 另一个想法: Delauney三角测量可以工作。 For the Delauney triangulation, it's given that any point's nearest neighbor is an adjacent node. 对于Delauney三角剖分,给出任何点的最近邻居都是相邻节点。 The intuition is that during a search, you can maintain a heap (priority queue) based on absolute distance from query point. 直觉是在搜索过程中,您可以根据与查询点的绝对距离来维护堆(优先级队列)。 Pick the nearest point, check that it's in range, and if so add all its neighbors. 选择最近的点,检查它是否在范围内,如果是,则添加其所有邻居。 I suspect that it's impossible to miss any points like this, but you'd need to look at it more carefully to be sure... 我怀疑不可能错过这样的任何一点,但你需要更仔细地看一下才能确定......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.