简体   繁体   中英

Pandas: Cumulative Sum From Start to End

I have a dataframe with start and end positions. I want to squash rows, where end_n is close to start_n+1 and add the corresponding values. In the end I want the cumulative sum and the start and end values from which that sum came.

Example indata, allowing a distance of <5 from end_n to start_n+1 :

           start        end       value
1          0            10        3
2          11           15        4
3          17           20        5
4          45           50        3
5          51           60        13
6          100          120       9

Desired result:

           start        end       value
1          0            10        3
2          11           15        4
3          17           20        5
4          45           50        3
5          51           60        13
6          100          120       9

or

           start        end       sum
1          0            20        12
4          45           60        16
6          100          120       9

I suppose a lambda function would do it, but the original data is large and that would impact performance. I would prefer a pure pandas/numpy solution.

Subtract shifted values and comapre if greater like 5 with cumulative sums for groups and then aggregate by GroupBy.agg :

g = df['start'].sub(df['end'].shift(fill_value=0)).gt(5).cumsum()

df = df.groupby(g).agg(start=('start', 'first'), end=('end','last'), sum=('value','sum'))
print (df)
   start  end  sum
0      0   20   12
1     45   60   16
2    100  120    9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM