[英]performance - finding all points within certain distance by lat/long
I have a CSV file with points tagged by lat/long (~10K points). 我有一个CSV文件,其标记为经/纬度(约10K点)。 I'd like to search for all points within a given distance of a user/specified lat/long coordinate — say, for example, the centroid of Manhattan.
我想搜索用户/指定纬度/经度坐标给定距离内的所有点-例如,曼哈顿的质心。
I'm pretty new to programming and databases, so this may be a basic question. 我对编程和数据库还很陌生,所以这可能是一个基本问题。 If so, I apologize.
如果是这样,我深表歉意。 Is it performant to do this search in pure Python without using a database?
在不使用数据库的情况下用纯Python进行此搜索是否表现出色? As in, could I simply read the CSV into memory and do the search with a Python script?
像这样,我可以简单地将CSV读取到内存中并使用Python脚本进行搜索吗? If it is performant, would it scale well as the number of points increases?
如果表现出色,那么随着分数增加,它会很好地扩展吗?
Or is this simply infeasible in Python, and I need to investigate using a database that supports geospatial queries? 还是在Python中根本不可行,我需要使用支持地理空间查询的数据库进行调查?
Additionally, how do I go about understanding the performance of these types of calculations so that I can develop a good intuition for this? 另外,我该如何理解这些类型的计算的性能,以便对此有一个很好的直觉?
This is definitely possible in python without any databases. 在没有任何数据库的python中,这绝对是可能的。 I would definitely recommend using numpy.
我绝对会推荐使用numpy。 I would do the following:
我将执行以下操作:
Because all calculations are vectorized, they happen at close to C speed. 因为所有计算都是矢量化的,所以它们以接近C的速度发生。
With an okay computer, the I/O will take like 2-3 seconds and the calculation will take less than 100-200 milliseconds. 如果计算机运行良好,则I / O大约需要2-3秒,而计算所需的时间不到100-200毫秒。
In terms of math, you can try http://en.wikipedia.org/wiki/Haversine_formula 在数学方面,您可以尝试http://en.wikipedia.org/wiki/Haversine_formula
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.