简体   繁体   English

优化scipy最近邻搜索

[英]Optimize scipy nearest neighbor search

I am trying to find all the nearest neighbors which are within 1 KM radius. 我试图找到距离1公里范围内的所有最近邻居。 Here is my script to construct tree and search the nearest points, 这是我构建树和搜索最近点的脚本,

from pysal.cg.kdtree import KDTree

def construct_tree(s):
    data_geopoints = [tuple(x) for x in s[['longitude','latitude']].to_records(index=False)]
    tree = KDTree(data_geopoints, distance_metric='Arc', radius=pysal.cg.RADIUS_EARTH_KM)
    return tree

def get_neighbors(s,tree):
    indices = tree.query_ball_point(s, 1)
    return indices

#Constructing the tree for search
tree = construct_tree(data)

#Finding the nearest neighbours within 1KM
data['neighborhood'] = data['lat_long'].apply(lambda row: get_neighbors(row,tree))

From what I read in pysal page, it says - 从我在pysal页面上看到的内容,它说 -

kd-tree built on top of kd-tree functionality in scipy. kd-tree建立在scipy的kd-tree功能之上。 If using scipy 0.12 or greater uses the scipy.spatial.cKDTree, otherwise uses scipy.spatial.KDTree. 如果使用scipy 0.12或更高版本使用scipy.spatial.cKDTree,否则使用scipy.spatial.KDTree。

In my case it should be using cKDTree. 在我的情况下,它应该使用cKDTree。 This is working fine for a sample dataset, but since the tree.query_ball_point returns the list of indices as a result. 这适用于样本数据集,但由于tree.query_ball_point返回索引列表作为结果。 Each list will have 100s of elements. 每个列表将包含100个元素。 For my data points (2 Million records), this is growing bigger and bigger and stops due to memory issue after certain point. 对于我的数据点(2百万条记录),这种情况越来越大,并且由于内存问题在某一点之后停止。 Any idea on how to solve this? 关于如何解决这个问题的任何想法?

Just in case if anyone looking for an answer for this, I have solved it by finding the nearest neighbours for a group (tree.query_ball_point can handle batches) and write in to database and then process next group, rather than keeping all in memory. 如果有人为此寻找答案,我通过找到一个组的最近邻居(tree.query_ball_point可以处理批处理)并写入数据库然后处理下一组而不是全部保留在内存中来解决它。 Thanks. 谢谢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM