简体繁体 English

性能-按纬度/经度查找特定距离内的所有点

[英]performance - finding all points within certain distance by lat/long

原文 2013-08-09 19:32:29 0 1 python/ geolocation/ geospatial

I have a CSV file with points tagged by lat/long (~10K points). 我有一个CSV文件，其标记为经/纬度（约10K点）。 I'd like to search for all points within a given distance of a user/specified lat/long coordinate — say, for example, the centroid of Manhattan. 我想搜索用户/指定纬度/经度坐标给定距离内的所有点-例如，曼哈顿的质心。

I'm pretty new to programming and databases, so this may be a basic question. 我对编程和数据库还很陌生，所以这可能是一个基本问题。 If so, I apologize. 如果是这样，我深表歉意。 Is it performant to do this search in pure Python without using a database? 在不使用数据库的情况下用纯Python进行此搜索是否表现出色？ As in, could I simply read the CSV into memory and do the search with a Python script? 像这样，我可以简单地将CSV读取到内存中并使用Python脚本进行搜索吗？ If it is performant, would it scale well as the number of points increases? 如果表现出色，那么随着分数增加，它会很好地扩展吗？

Or is this simply infeasible in Python, and I need to investigate using a database that supports geospatial queries? 还是在Python中根本不可行，我需要使用支持地理空间查询的数据库进行调查？

Additionally, how do I go about understanding the performance of these types of calculations so that I can develop a good intuition for this? 另外，我该如何理解这些类型的计算的性能，以便对此有一个很好的直觉？

1 个解决方案

This is definitely possible in python without any databases. 在没有任何数据库的python中，这绝对是可能的。 I would definitely recommend using numpy. 我绝对会推荐使用numpy。 I would do the following: 我将执行以下操作：

read all points from csv into a numpy array 将csv中的所有点读入numpy数组
Calculate the distance of each point to your given point 计算每个点到给定点的距离
Sort the distance or simply find the one with minimum distance using argmin 排序距离或使用argmin找出最小距离的距离

Because all calculations are vectorized, they happen at close to C speed. 因为所有计算都是矢量化的，所以它们以接近C的速度发生。

With an okay computer, the I/O will take like 2-3 seconds and the calculation will take less than 100-200 milliseconds. 如果计算机运行良好，则I / O大约需要2-3秒，而计算所需的时间不到100-200毫秒。

In terms of math, you can try http://en.wikipedia.org/wiki/Haversine_formula 在数学方面，您可以尝试http://en.wikipedia.org/wiki/Haversine_formula