简体   繁体   English

在 Pandas 中查找 2 系列点之间的距离,最快迭代

[英]Find the distance between 2 series of points in Pandas, Fastest Iteration

Have 2 sets of data, 1 which contains coordinates of fixed location called locations有 2 组数据,其中 1 组包含称为位置的固定位置的坐标

固定位置表

And a secondary table of vehicle movements called movements还有一个称为运动的辅助车辆运动表在此处输入图像描述

What would be the fastest way to iterate through both tables to find if any of the movements are within a certain distance of a location, eg the Euclidean distance between a point on the movements and a point on any of the locations?遍历两个表以查找任何运动是否在某个位置的一定距离内的最快方法是什么,例如运动上的点与任何位置上的点之间的欧几里得距离?

Currently am using a nested loop which is incredibly slow.目前正在使用一个非常慢的嵌套循环。 Both pandas df have been converted using pandas df 都已使用

locations_dict=locations.to_dict('records')
movements_dict=movements.to_dict('records')

then iterated via:然后通过以下方式迭代:

for movement in movements_dict:
    visit='no visit'
    for location in locations_dict:
        distance = np.sqrt((location['Latitude']-movement['Lat'])**2+(location['Longitude']-movement['Lng'])**2)
        if distance < 0.05:
            visit=location['Location']
            break
        else:
            continue
    movement['distance']=distance
    movement['visit']=visit

Any way to make this faster?有什么办法可以让这更快吗? The main issue is this operation is a cartesian product, and any inserts will increase the complexity of the operation significantly.主要问题是这个操作是笛卡尔积,任何插入都会显着增加操作的复杂性。

You can export the pandas data directly to numpy for example like this:您可以将 pandas 数据直接导出到 numpy,例如:

loc_lat=locations['Latitude' ].to_numpy()
loc_lon=locations['Longitude'].to_numpy()
mov_lat=movements['Lat'      ].to_numpy()
mov_lon=movements['Lon'      ].to_numpy()

From now on there is no need to use loops to obtain results as you can rely on numpy working an entire arrays at once.从现在开始,无需使用循环来获得结果,因为您可以依靠 numpy 一次运行整个 arrays。 This should give a great speedup over the approach using Python looping over dictionary values.与使用 Python 循环遍历字典值的方法相比,这应该会大大加快速度。

Check out following code example showing how to get an array with all pairs from two arrays:查看以下代码示例,展示如何从两个 arrays 获取包含所有对的数组:

import numpy as np
a = np.array([1,2,3])
b = np.array([4,5])
print( np.transpose([np.tile(a, len(b)), np.repeat(b,len(a))]) )
gives_as_print = """
[[1 4]
 [2 4]
 [3 4]
 [1 5]
 [2 5]
 [3 5]]"""

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM