简体   繁体   English

如何在pandas数据帧中的一组行上执行函数?

[英]How to execute a function on a group of rows in pandas dataframe?

I am trying to implement an algorithm . 我正在尝试实现一种算法 Let's say the algorithm is executed as the function "xyz" 假设算法作为函数“xyz”执行

The function is specifically designed to operate on trajectory data, ie (x,y) coordinates. 该功能专门设计用于轨迹数据,即(x,y)坐标。

The function takes two arguments: 该函数有两个参数:

the first argument is a list of tuples of (x,y) points, 第一个参数是(x,y)点的元组 列表

and the second is a constant value. 第二个是常数值。

It can be illustrated as follows: 它可以说明如下:

 line = [(0,0),(1,0),(2,0),(2,1),(2,2),(1,2),(0,2),(0,1),(0,0)]
 xyz(line, 5.0) #calling the function

Output: 输出:

 [(0, 0), (2, 0), (2, 2), (0, 2), (0, 0)]

This can be easily implemented when there is only one line. 当只有一条线时,这很容易实现。 But I have a huge data frame as follows: 但我有一个庞大的数据框如下:

     id      x     y    x,y
  0  1       0     0    (0,0)
  1  1       1     0    (1,0)
  2  1       2     0    (2,0)
  3  1       2     1    (2,1)
  4  1       2     2    (2,2)
  5  1       1     2    (1,2)
  6  2       1     3    (1,3)
  7  2       1     4    (1,4)
  8  2       2     3    (2,3)
  9  2       1     2    (1,2)
 10  3       2     5    (2,5)
 11  3       3     3    (3,3)
 12  3       1     9    (1,9)
 13  3       4     6    (4,6)

In the above data frame, rows with same "id" forms the points of one separate trajectory/ line. 在上述数据框中,具有相同“id”的行形成一个单独轨迹/线的点。 I want to implement the above mentioned function for each of these lines. 我想为这些行中的每一行实现上述功能。

We can observe from the df there are 3 different trajectories with ids 1,2,3. 我们可以从df观察到有3种不同的轨迹,其中id为1,2,3。 Trajectory 1 has its x, y value in row (0-5), trajectory 2 has its points in rows (6-9) and so on.. 轨迹1的x,y值在行(0-5)中,轨迹2的点在行(6-9)中,依此类推。

How to implement function "xyz" for each of these lines, and since output of this function is again a list of tuples of x,y coordinates, how to store this list? 如何为这些行中的每一行实现函数“xyz”,并且由于此函数的输出再次是x,y坐标的元组列表,如何存储此列表? Note: The output list can contain any random number of tuples. 注意:输出列表可以包含任意随机数的元组。

I think you need groupby with apply : 我认为你需要groupby with apply

print (df.groupby('id')['x,y'].apply(lambda x: xyz(x, 5.0)))

Or: 要么:

print (df.groupby('id')['x,y'].apply(xyz, 5.0))

Sample with rdp function - is necessary add tolist , else get KeyError: -1 : 使用rdp函数的示例 - 必须添加tolist ,否则获取tolist KeyError: -1

print (df.groupby('id')['x,y'].apply(lambda x: rdp(x.tolist(), 5.0)))
#alternative with list
#print (df.groupby('id')['x,y'].apply(lambda x: rdp(list(x), 5.0))
id
1    [(0, 0), (1, 2)]
2    [(1, 3), (1, 2)]
3    [(2, 5), (4, 6)]
Name: x,y, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM