简体   繁体   English

Pandas Dataframe滚动有两列和两行

[英]Pandas Dataframe rolling with two columns and two rows

I got a dataframe with two columns that are holding Longitude and Latitude coordinates: 我有一个数据框,其中有两列保持经度和纬度坐标:

import pandas as pd 将pandas导入为pd

values = {'Latitude': {0: 47.021503365600005,
  1: 47.021503365600005,
  2: 47.021503365600005,
  3: 47.021503365600005,
  4: 47.021503365600005,
  5: 47.021503365600005},
 'Longitude': {0: 15.481974060399999,
  1: 15.481974060399999,
  2: 15.481974060399999,
  3: 15.481974060399999,
  4: 15.481974060399999,
  5: 15.481974060399999}}

df = pd.DataFrame(values)
df.head()

Now I want to apply a rolling window function on the dataframe that takes the Longitude AND Latitude (two columns) of one row and another row (window size 2) in order to calculate the haversine distance. 现在我想在数据框上应用滚动窗口函数,该数据框采用一行和另一行(窗口大小2)的经度和纬度(两列)来计算半正距离。

def haversine_distance(x):
    print (x)

df.rolling(2, axis=1).apply(haversine_distance)

My problem is that I never get all four values Lng1, Lat1 (first row) and Lng2, Lat2 (second row). 我的问题是我从来没有得到所有四个值Lng1,Lat1(第一行)和Lng2,Lat2(第二行)。 If I use axis=1, then I will get Lng1 and Lat1 of the first row. 如果我使用axis = 1,那么我将获得第一行的Lng1和Lat1。 If I use axis=0, then I will get Lng1 and Lng2 of the first and second row, but Longitude only. 如果我使用axis = 0,那么我将获得第一行和第二行的Lng1和Lng2,但仅限于经度。

How can I apply a rolling window using two rows and two columns? 如何使用两行和两列应用滚动窗口? Somewhat like this: 有点像这样:

def haversine_distance(x):
    row1 = x[0]
    row2 = x[1]
    lng1, lat1 = row1['Longitude'], row1['Latitude']
    lng2, lat2 = row2['Longitude'], row2['Latitude']
    # do your stuff here
    return 1

Currently I'm doing this calculation by joining the dataframe with itself by shift(-1) resulting in all four coordinates in one line. 目前我正在通过shift(-1)将数据帧与自身连接来进行此计算,从而导致一行中的所有四个坐标。 But it should be possible with rolling as well. 但是滚动也应该是可能的。 Another option is combining Lng and Lat into one column and apply rolling with axis=0 onto that. 另一个选择是将Lng和Lat组合成一列,并将轴= 0的滚动应用到该列上。 But there must be an easier way, right? 但必须有一个更简单的方法,对吗?

Since pandas v0.23 it is now possible to pass a Series instead of a ndarray to Rolling.apply() . 从pandas v0.23开始,现在可以将Series而不是ndarray给Rolling.apply() Just set raw=False . 只需设置raw=False

raw : bool, default None raw :bool,默认无

False : passes each row or column as a Series to the function. False :将每行或每列作为Series传递给函数。

True or None : the passed function will receive ndarray objects instead. TrueNone :传递的函数将接收ndarray对象。 If you are just applying a NumPy reduction function this will achieve much better performance. 如果您只是应用NumPy减少功能,这将获得更好的性能。 The raw parameter is required and will show a FutureWarning if not passed. 原始参数是必需的,如果未传递,将显示FutureWarning。 In the future raw will default to False. 将来raw将默认为False。

New in version 0.23.0. 版本0.23.0中的新功能。

So building on your given example, you could move the latitude to the index and pass the whole longitude series---including the index---to your function: 因此,基于您给定的示例,您可以将纬度移动到索引并将整个经度系列(包括索引)传递给您的函数:

df = df.set_index('Latitude')
df['Distance'] = df['Longitude'].rolling(2).apply(haversine_distance, raw=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM