简体   繁体   中英

Pandas cumulative sum starting last row where condition was satisfied

I have a dataframe of the following form:

|----------|----|------|
|date      |type|inflow|
|----------|----|------|
|2017-01-01|I   |  3500|
|2017-02-01|A   |    23|
|2017-07-01|A   |    44|
|2017-09-01|A   |    55|
|2017-12-01|A   |    12|
|2018-01-01|I   |  3800|
|2018-03-01|A   |    87|
|2018-05-01|A   |    34|
|2018-07-01|A   |    23|
|----------|----|------|

I is the initial inflow and As are additional inflows. They are not necessarily grouped by years and the dates can be arbitrary. I want a cumulative sum in each row, starting the last time I encounter an I. So the cumulative sum should reset when I encounter another I. If it helps, the maximum number of As between two Is can be 5.

I tried using apply and rollapply, but not able to figure out how to apply them on an inconsistent rolling window. How can I do this using Pandas?

Let's try GroupBy.cumsum :

df['inflow_cumsum'] = df.groupby(df['type'].eq('I').cumsum())['inflow'].cumsum()
df

         date type  inflow  inflow_cumsum
0  2017-01-01    I    3500           3500
1  2017-02-01    A      23           3523
2  2017-07-01    A      44           3567
3  2017-09-01    A      55           3622
4  2017-12-01    A      12           3634
5  2018-01-01    I    3800           3800
6  2018-03-01    A      87           3887
7  2018-05-01    A      34           3921
8  2018-07-01    A      23           3944

Details
df['type'].eq('I').cumsum() is used to mark groups of inflows to perform the group-wise cumulative sum.

See below for a visualization:

type  type == "I"  (type == "I").cumsum()
   I         True                       1
   A        False                       1
   A        False                       1
   A        False                       1
   A        False                       1
   I         True                       2
   A        False                       2
   A        False                       2
   A        False                       2

You'll notice the column of 1s and 2s is what will uniquely identify groups to perform the cumsum over.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM