简体   繁体   中英

Calculate cumprod based on condition pandas dataframe python

Sample dataframe is as follows:-

import pandas as pd
import numpy as np
from datetime import datetime
start = datetime(2011, 1, 1)
end = datetime(2012, 1, 1)

index = pd.date_range(start, end)
df = pd.DataFrame(np.random.randn(366, 1), index=index, columns=["Returns"])

I know that Cumulative returns are computed as follows(with unit starting value)

start=1.0
df['Cumulative Returns']=start * (1 + df['Returns']).cumprod()

I need to compute cumulative returns based on boolean condition of another column.

df['bool']=0
df.iloc[0:5,2]=1
df.iloc[8:18,2]=1

Data looks as follows:-

Returns  Cumulative Returns  bool
2011-01-01 -0.180628            0.819372     1
2011-01-02  0.585284            1.298938     1
2011-01-03  0.032713            1.341430     1
2011-01-04  0.161464            1.558023     1
2011-01-05  1.741576            4.271438     1
2011-01-06 -1.893358           -3.815922     0
2011-01-07  0.015942           -3.876755     0
2011-01-08 -0.615686           -1.489891     0
2011-01-09  0.330300           -1.982002     1
2011-01-10  0.274620           -2.526298     1
2011-01-11  0.222498           -3.088395     1
2011-01-12 -0.131634           -2.681858     1
2011-01-13 -0.217193           -2.099378     1
2011-01-14 -0.794016           -0.432438     1
2011-01-15  0.077270           -0.465853     1
2011-01-16  0.388143           -0.646670     1
2011-01-17  0.361618           -0.880518     1
2011-01-18 -1.732723            0.645176     1
2011-01-19 -0.045690            0.615698     0
2011-01-20  1.018151            1.242571     0
2011-01-21 -0.218665            0.970865     0
2011-01-22 -1.454362           -0.441124     0
2011-01-23  1.401056           -1.059163     0
2011-01-24  0.233366           -1.306336     0
2011-01-25 -0.235055           -0.999275     0
2011-01-26  0.577812           -1.576668     0
2011-01-27  0.510124           -2.380965     0
2011-01-28 -0.848362           -0.361045     0
2011-01-29  0.712476           -0.618281     0
2011-01-30 -0.176403           -0.509214     0

I want to compute non continuous cumulative returns from 2011-01-01 to 2011-01-05 and again from 2011-01-09 to 2011-01-18 that is based on bool column.

Use your original formula, but only for rows with bool == 1 . To do it, instead of df use df[df['bool'] == 1] . So the whole instruction can be:

df['CumProd2'] = start * (1 + df[df['bool'] == 1].Returns).cumprod()

Values for bool == 0 are left as NaN . If you want to change them to eg 0 , run:

df.CumProd2.fillna(0, inplace=True)

You can identify the 1 blocks with cumsum , then groupby().cumprod() on that blocks:

df['CumProd'] = (df['Returns'].add(1).mul(df['bool'])
                  .groupby([df['bool'].ne(1).cumsum(), df['bool']]).cumprod()
                )

Output:

             Returns  Cumulative Returns  bool   CumProd
2011-01-01 -0.180628            0.819372     1  0.819372
2011-01-02  0.585284            1.298938     1  1.298937
2011-01-03  0.032713            1.341430     1  1.341429
2011-01-04  0.161464            1.558023     1  1.558022
2011-01-05  1.741576            4.271438     1  4.271436
2011-01-06 -1.893358           -3.815922     0 -0.000000
2011-01-07  0.015942           -3.876755     0  0.000000
2011-01-08 -0.615686           -1.489891     0  0.000000
2011-01-09  0.330300           -1.982002     1  1.330300
2011-01-10  0.274620           -2.526298     1  1.695627
2011-01-11  0.222498           -3.088395     1  2.072901
2011-01-12 -0.131634           -2.681858     1  1.800036
2011-01-13 -0.217193           -2.099378     1  1.409081
2011-01-14 -0.794016           -0.432438     1  0.290248
2011-01-15  0.077270           -0.465853     1  0.312676
2011-01-16  0.388143           -0.646670     1  0.434038
2011-01-17  0.361618           -0.880518     1  0.590995
2011-01-18 -1.732723            0.645176     1 -0.433035
2011-01-19 -0.045690            0.615698     0  0.000000
2011-01-20  1.018151            1.242571     0  0.000000
2011-01-21 -0.218665            0.970865     0  0.000000
2011-01-22 -1.454362           -0.441124     0 -0.000000
2011-01-23  1.401056           -1.059163     0  0.000000
2011-01-24  0.233366           -1.306336     0  0.000000
2011-01-25 -0.235055           -0.999275     0  0.000000
2011-01-26  0.577812           -1.576668     0  0.000000
2011-01-27  0.510124           -2.380965     0  0.000000
2011-01-28 -0.848362           -0.361045     0  0.000000
2011-01-29  0.712476           -0.618281     0  0.000000
2011-01-30 -0.176403           -0.509214     0  0.000000

​

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM