[英]How to execute a function on a group of rows in pandas dataframe?
I am trying to implement an algorithm . 我正在尝试实现一种算法 。 Let's say the algorithm is executed as the function "xyz" 假设算法作为函数“xyz”执行
The function is specifically designed to operate on trajectory data, ie (x,y) coordinates. 该功能专门设计用于轨迹数据,即(x,y)坐标。
The function takes two arguments: 该函数有两个参数:
the first argument is a list of tuples of (x,y) points, 第一个参数是(x,y)点的元组 列表 ,
and the second is a constant value. 第二个是常数值。
It can be illustrated as follows: 它可以说明如下:
line = [(0,0),(1,0),(2,0),(2,1),(2,2),(1,2),(0,2),(0,1),(0,0)]
xyz(line, 5.0) #calling the function
Output: 输出:
[(0, 0), (2, 0), (2, 2), (0, 2), (0, 0)]
This can be easily implemented when there is only one line. 当只有一条线时,这很容易实现。 But I have a huge data frame as follows: 但我有一个庞大的数据框如下:
id x y x,y
0 1 0 0 (0,0)
1 1 1 0 (1,0)
2 1 2 0 (2,0)
3 1 2 1 (2,1)
4 1 2 2 (2,2)
5 1 1 2 (1,2)
6 2 1 3 (1,3)
7 2 1 4 (1,4)
8 2 2 3 (2,3)
9 2 1 2 (1,2)
10 3 2 5 (2,5)
11 3 3 3 (3,3)
12 3 1 9 (1,9)
13 3 4 6 (4,6)
In the above data frame, rows with same "id" forms the points of one separate trajectory/ line. 在上述数据框中,具有相同“id”的行形成一个单独轨迹/线的点。 I want to implement the above mentioned function for each of these lines. 我想为这些行中的每一行实现上述功能。
We can observe from the df there are 3 different trajectories with ids 1,2,3. 我们可以从df观察到有3种不同的轨迹,其中id为1,2,3。 Trajectory 1 has its x, y value in row (0-5), trajectory 2 has its points in rows (6-9) and so on.. 轨迹1的x,y值在行(0-5)中,轨迹2的点在行(6-9)中,依此类推。
How to implement function "xyz" for each of these lines, and since output of this function is again a list of tuples of x,y coordinates, how to store this list? 如何为这些行中的每一行实现函数“xyz”,并且由于此函数的输出再次是x,y坐标的元组列表,如何存储此列表? Note: The output list can contain any random number of tuples. 注意:输出列表可以包含任意随机数的元组。
I think you need groupby
with apply
: 我认为你需要groupby
with apply
:
print (df.groupby('id')['x,y'].apply(lambda x: xyz(x, 5.0)))
Or: 要么:
print (df.groupby('id')['x,y'].apply(xyz, 5.0))
Sample with rdp
function - is necessary add tolist
, else get KeyError: -1
: 使用rdp
函数的示例 - 必须添加tolist
,否则获取tolist
KeyError: -1
:
print (df.groupby('id')['x,y'].apply(lambda x: rdp(x.tolist(), 5.0)))
#alternative with list
#print (df.groupby('id')['x,y'].apply(lambda x: rdp(list(x), 5.0))
id
1 [(0, 0), (1, 2)]
2 [(1, 3), (1, 2)]
3 [(2, 5), (4, 6)]
Name: x,y, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.