I have a dataframe of the following form:
|----------|----|------|
|date |type|inflow|
|----------|----|------|
|2017-01-01|I | 3500|
|2017-02-01|A | 23|
|2017-07-01|A | 44|
|2017-09-01|A | 55|
|2017-12-01|A | 12|
|2018-01-01|I | 3800|
|2018-03-01|A | 87|
|2018-05-01|A | 34|
|2018-07-01|A | 23|
|----------|----|------|
I is the initial inflow and As are additional inflows. They are not necessarily grouped by years and the dates can be arbitrary. I want a cumulative sum in each row, starting the last time I encounter an I. So the cumulative sum should reset when I encounter another I. If it helps, the maximum number of As between two Is can be 5.
I tried using apply and rollapply, but not able to figure out how to apply them on an inconsistent rolling window. How can I do this using Pandas?
Let's try GroupBy.cumsum
:
df['inflow_cumsum'] = df.groupby(df['type'].eq('I').cumsum())['inflow'].cumsum()
df
date type inflow inflow_cumsum
0 2017-01-01 I 3500 3500
1 2017-02-01 A 23 3523
2 2017-07-01 A 44 3567
3 2017-09-01 A 55 3622
4 2017-12-01 A 12 3634
5 2018-01-01 I 3800 3800
6 2018-03-01 A 87 3887
7 2018-05-01 A 34 3921
8 2018-07-01 A 23 3944
Detailsdf['type'].eq('I').cumsum()
is used to mark groups of inflows to perform the group-wise cumulative sum.
See below for a visualization:
type type == "I" (type == "I").cumsum()
I True 1
A False 1
A False 1
A False 1
A False 1
I True 2
A False 2
A False 2
A False 2
You'll notice the column of 1s and 2s is what will uniquely identify groups to perform the cumsum over.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.