Pandas：从头到尾的累计

Question

I have a dataframe with start and end positions.我有一个带有开始和结束位置的 dataframe。 I want to squash rows, where end_n is close to start_n+1 and add the corresponding values.我想压缩行，其中end_n接近start_n+1并添加相应的值。 In the end I want the cumulative sum and the start and end values from which that sum came.最后，我想要累积总和以及该总和的起始值和结束值。

Example indata, allowing a distance of <5 from end_n to start_n+1 :示例 indata，允许从end_n到start_n+1的距离 <5：

           start        end       value
1          0            10        3
2          11           15        4
3          17           20        5
4          45           50        3
5          51           60        13
6          100          120       9

Desired result:期望的结果：

           start        end       value
1          0            10        3
2          11           15        4
3          17           20        5
4          45           50        3
5          51           60        13
6          100          120       9

or或者

           start        end       sum
1          0            20        12
4          45           60        16
6          100          120       9

I suppose a lambda function would do it, but the original data is large and that would impact performance.我想 lambda function 会这样做，但原始数据很大，会影响性能。 I would prefer a pure pandas/numpy solution.我更喜欢纯粹的 pandas/numpy 解决方案。

Answer 1

Subtract shifted values and comapre if greater like 5 with cumulative sums for groups and then aggregate by GroupBy.agg :减去移位的值，如果大于5 ，则减去组的累积总和，然后按GroupBy.agg聚合：

g = df['start'].sub(df['end'].shift(fill_value=0)).gt(5).cumsum()

df = df.groupby(g).agg(start=('start', 'first'), end=('end','last'), sum=('value','sum'))
print (df)
   start  end  sum
0      0   20   12
1     45   60   16
2    100  120    9

Pandas：从头到尾的累计

问题描述

1 个解决方案

解决方案1
3 已采纳 2022-09-14 06:44:06

Pandas：从头到尾的累计

问题描述

1 个解决方案

解决方案1 3 已采纳 2022-09-14 06:44:06

解决方案1
3 已采纳 2022-09-14 06:44:06