简体   繁体   English

算法找到附近的朋友?

[英]algorithm to find the nearby friends?

I have a python program sitting in server side managing user location informations, each friend has a pair of (longtitude, latitude), given a (longtitude, latitude) point, how I can find the nearby(say within 5KM) friends efficiently? 我有一个python程序坐在服务器端管理用户位置信息,每个朋友有一对(经度,纬度),给定(经度,纬度)点,我如何有效地找到附近(比如在5KM以内)的朋友?

I have 10K users online... 我有10K用户在线...

Thanks. 谢谢。 Bin 箱子

New Answer: 新答案:

I would store lat and long in separate columns. 我会将lat和long存储在单独的列中。 Place indexes on them. 在它们上放置索引。 Then when you want to find the nearby friends of a particular user, just do something like 然后,当你想找到特定用户的附近朋友时,只需要做类似的事情

select field1, field1, ..., fieldn from users 
where 
    user_lat > this_lat - phi and user_lat < this_lat + phi
    and
    user_lon > this_lon - omega and user_lon < this_lon + omega

where phi and omega are the degrees of latitude and longitude that correspond to your desired distance. 其中phiomega是与您所需距离相对应的纬度和经度。 This will vary depending on where on the globe you are but there are established equations for figuring it out. 这将取决于你在地球上的位置,但有确定的方程式来计算它。 There's also the possibility that your database can do those calculations for you. 您的数据库也可能为您进行这些计算。


old answer. 老答案。

I would look at quadtrees and kd-trees . 我会看四叉树kd树

Kd-trees would be the canonical solution here, I believe. 我相信,Kd树将成为这里的规范解决方案。

A simple way would be to sort the points along the longtitude, then, when looking up friends, find the minimum and maximum longtitudes of possible matches. 一种简单的方法是沿着长度对点进行排序,然后,当查找朋友时,找到可能匹配的最小和最大长度。 Sorting the list is O(n log n), and looking up for friends is linear, but only for friends within the longtitude range. 对列表进行排序是O(n log n),查找朋友是线性的,但仅适用于长度范围内的朋友。 Here's an example for the case where you have all the points on a flat 2D surface: 以下是平面2D曲面上所有点的情况示例:

# friends is the sorted list of (x, y) tuples, (px, py) is my location
def get_near(friends, px, py, maxdist):
    i1 = bisect.bisect_left(friends, (px - maxdist, py))
    i2 = bisect.bisect_right(friends, (px + maxdist, py))
    return [(x, y) for (x, y) in friends[i1:i2] if math.hypot(px - x, py - y) < maxdist]

For the longtitude/latitude case, you'd have to use another function for testing for distance instead of the euclidean distance(math.hypot). 对于经度/纬度情况,您必须使用另一个函数来测试距离而不是欧氏距离(math.hypot)。

Make a dict {graticule: [users]} (a "graticule" is a block of 1 degree latitude x 1 degree longitude; so you can basically just round the values). 制作一个词典{graticule:[users]}(“经纬网”是1度纬度x经度1度的块;所以你基本上只能对值进行舍入)。 To find nearby users, first get users from the same and adjacent graticules (since the target could be near an edge), then filter them with a basic bounding-box test (ie what are the minimum longitude/latitude that are possible for someone within the desired radius), then do a detailed test (if you need accuracy then you are in for some more complex math than just Pythagoras). 要查找附近的用户,首先从相同和相邻的经纬网获取用户(因为目标可能在边缘附近),然后使用基本边界框测试过滤它们(即,对于某人内部可能的最小经度/纬度是多少?所需的半径),然后做一个详细的测试(如果你需要准确性,那么你需要一些比Pythagoras更复杂的数学)。

http://www.movable-type.co.uk/scripts/latlong.html in terms of efficiency the only thing that's really coming to mind is pre-computing the distance as entries are made into the database, that is have another table that stores a pair of locations along with the distance, for each location that's added at the time it's added you'd incur the cost of calculating it's distance to every other point in the system but then lookups on this table could quickly resolve locations within a certain distance. http://www.movable-type.co.uk/scripts/latlong.html在效率方面,唯一真正想到的是在数据库中输入条目时预先计算距离,即有另一个表存储一对位置以及距离,对于添加时添加的每个位置,您需要计算它与系统中每个其他点的距离的成本,但随后在此表上查找可以快速解析一定的距离。

Aaronasterling's answer appears to be what I was trying to think through by myself but didn't know existed :) so it's probably a better solution, but I'm sure you'll incur something of overhead at search time using that algorithm (albeit probably small since generally traversing a tree so long as it's reasonably balanced is usually a pretty fast process, going to take me some time to understand exactly how the tree is composed still that one's a new concept to me). Aaronasterling的答案似乎是我自己想要思考的但是不知道存在:)所以它可能是一个更好的解决方案,但我相信你会在搜索时使用该算法产生一些开销(虽然可能因为通常遍历一棵树,只要它合理平衡通常是一个相当快的过程,我需要一些时间才能确切地了解树是如何组成的,这对我来说是一个新的概念)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM