[英]Grouped-By DataFrame: Use column-values in current and previous row in Function
I've got a dataframe with this kind of structure: 我有一个具有这种结构的数据框:
import pandas as pd
from geopy.distance import vincenty
data = {'id': [1, 2, 3, 1, 2 , 3],
'coord': [[10.1, 30.3], [10.5, 32.3], [11.1, 31.3],
[10.1, 30.3], [10.5, 32.3], [61, 29.1]],
}
df = pd.DataFrame(data)
This is how it looks: 它是这样的:
coord id
0 [10.1, 30.3] 1
1 [10.5, 32.3] 2
2 [11.1, 31.3] 3
3 [10.1, 30.3] 1
4 [10.5, 32.3] 2
5 [61, 29.1] 3
Now, I want to group by id
. 现在,我想按
id
分组。 Then, I want to use the current and previous row of coords
. 然后,我要使用当前和上一行的
coords
。 These should be used in a function to compute the distance between the two coordinates: 这些应在函数中用于计算两个坐标之间的距离:
This is what I've tried: 这是我尝试过的:
df.groupby('id')['coord'].apply(lambda x: vincenty(x, x.shift(1)))
vincenty(x,y)
expects x
like (10, 20) and the same for y
and returns a float. vincenty(x,y)
期望x
像(10,20)并且对y
一样,并返回浮点数。
Obviously, this does not work. 显然,这是行不通的。 The function receives two Series objects instead of the two lists.
该函数接收两个Series对象,而不是两个列表。 So probably using
x.values.tolist()
should be the next step. 因此,下一步可能应该使用
x.values.tolist()
。 However, my understanding of things ends here. 但是,我对事物的理解到此为止。 Hence, I'd appreciate any ideas on how to tackle this!
因此,对于任何解决此问题的想法,我将不胜感激!
I think you need shift
column per group and then apply function with filter out NaN
s rows: 我认为您需要按组
shift
列,然后应用功能过滤掉NaN
的行:
def vincenty(x, y):
print (x,y)
return x + y
df['new'] = df.groupby('id')['coord'].shift()
m = df['new'].notnull()
df.loc[m, 'out'] = df.loc[m, :].apply(lambda x: vincenty(x['coord'], x['new']), axis=1)
print (df)
coord id new out
0 [10.1, 30.3] 1 NaN NaN
1 [10.5, 32.3] 2 NaN NaN
2 [11.1, 31.3] 3 NaN NaN
3 [10.1, 30.3] 1 [10.1, 30.3] [10.1, 30.3, 10.1, 30.3]
4 [10.5, 32.3] 2 [10.5, 32.3] [10.5, 32.3, 10.5, 32.3]
5 [61, 29.1] 3 [11.1, 31.3] [61, 29.1, 11.1, 31.3]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.