简体   繁体   English

"有效地从 Python 中的集合中找到最近的坐标对"

[英]Efficiently finding the closest coordinate pair from a set in Python

The Problem问题

Imagine I am stood in an airport.想象一下我站在机场。 Given a geographic coordinate pair, how can one efficiently determine which airport I am stood in?给定一个地理坐标对,如何有效地确定我站在哪个机场?

Inputs输入

  • A coordinate pair (x,y) representing the location I am stood at.代表我所在位置的坐标对(x,y)
  • A set of coordinate pairs [(a1,b1), (a2,b2)...] where each coordinate pair represents one airport.一组坐标对[(a1,b1), (a2,b2)...] ,其中每个坐标对代表一个机场。

Desired Output期望的输出

A coordinate pair (a,b) from the set of airport coordinate pairs representing the closest airport to the point (x,y) .来自机场坐标对集合的坐标对(a,b) ,表示离点(x,y)最近的机场。

Inefficient Solution低效解决方案

Here is my inefficient attempt at solving this problem.这是我解决这个问题的低效尝试。 It is clearly linear in the length of the set of airports.它在机场集合的长度上显然是线性的。

shortest_distance = None
shortest_distance_coordinates = None

point = (50.776435, -0.146834)

for airport in airports:
    distance = compute_distance(point, airport)
    if distance < shortest_distance or shortest_distance is None:
        shortest_distance = distance
        shortest_distance_coordinates = airport

The Question问题

How can this solution be improved?如何改进此解决方案? This might involve some way of pre-filtering the list of airports based on the coordinates of the location we are currently stood at, or sorting them in a certain order beforehand.这可能涉及基于我们当前所在位置的坐标预先过滤机场列表的某种方式,或者预先按特定顺序对它们进行排序。

Using a k-dimensional tree: 使用k维树:

>>> from scipy import spatial
>>> airports = [(10,10),(20,20),(30,30),(40,40)]
>>> tree = spatial.KDTree(airports)
>>> tree.query([(21,21)])
(array([ 1.41421356]), array([1]))

Where 1.41421356 is the distance between the queried point and the nearest neighbour and 1 is the index of the neighbour. 其中1.41421356是查询点与最近邻居之间的距离,1是邻居的索引。

See: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query.html#scipy.spatial.KDTree.query 请参阅: http//docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query.html#scipy.spatial.KDTree.query

From this SO question : 从这个SO问题

import numpy as np
def closest_node(node, nodes):
    nodes = np.asarray(nodes)
    deltas = nodes - node
    dist_2 = np.einsum('ij,ij->i', deltas, deltas)
    return np.argmin(dist_2)

where node is a tuple with two values (x, y) and nodes is an array of tuples with two values ( [(x_1, y_1), (x_2, y_2),] ) 其中node是具有两个值(x,y)的元组,而nodes是具有两个值的元组数组( [(x_1, y_1), (x_2, y_2),]

If your coordinates are unsorted, your search can only be improved slightly assuming it is (latitude,longitude) by filtering on latitude first as for earth 如果您的坐标未分类,则只能通过首先在纬度上进行过滤来确定(latitude,longitude) ,您的搜索只会稍微改善一下

1 degree of latitude on the sphere is 111.2 km or 69 miles 球体的1度纬度为111.2 km或69英里

but that would not give a huge speedup. 但这不会带来巨大的加速。

If you sort the airports by latitude first then you can use a binary search for finding the first airport that could match ( airport_lat >= point_lat-tolerance ) and then only compare up to the last one that could match ( airport_lat <= point_lat+tolerance ) - but take care of 0 degrees equaling 360. While you cannot use that library directly, the sources of bisect are a good start for implementing a binary search. 如果您airport_lat >= point_lat-tolerance纬度对机场进行排序,那么您可以使用二分搜索来找到可以匹配的第一个机场( airport_lat >= point_lat-tolerance ),然后只比较最后一个可以匹配的airport_lat <= point_lat+toleranceairport_lat <= point_lat+tolerance ) - 但要注意0度等于360.虽然你不能直接使用该库,但是bisect的来源是实现二进制搜索的良好开端。

While technically this way the search is still O(n), you have much fewer actual distance calculations (depending on tolerance) and few latitude comparisons. 虽然从技术上讲这种方式搜索仍然是O(n),但实际距离计算(取决于公差)和纬度比较很少。 So you will have a huge speedup. 所以你将获得巨大的加速。

The answer of @Juddling is great, but KDTree does not support haversine distance, which is better suited for latitude/longitude coordinates. @Juddling 的答案很好,但是 KDTree 不支持半正弦距离,它更适合经纬度坐标。 For the haversine distance you can use BallTree.对于半正弦距离,您可以使用 BallTree。 Please note, that you need to convert your coordinates to radians first.请注意,您需要先将坐标转换为弧度。

from math import radians
from sklearn.neighbors import BallTree
import numpy as np

airports = [(10,10),(20,20),(30,30),(40,40)]
airports_rad = np.array([[radians(x[0]), radians(x[1])] for x in airports ])
tree = BallTree(airports_rad , metric = 'haversine')
result = tree.query([(radians(21),radians(21))])
print(result)

gives

(array([[0.02391369]]), array([[1]], dtype=int64))

To convert the distance to meters you need to multiply by the earth radius (in meters).要将距离转换为米,您需要乘以地球半径(以米为单位)。

earth_radius = 6371000 # meters in earth
print(result[0][0] * earth_radius)
[152354.11114795]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM