Sample dataframe is as follows:-
import pandas as pd
import numpy as np
from datetime import datetime
start = datetime(2011, 1, 1)
end = datetime(2012, 1, 1)
index = pd.date_range(start, end)
df = pd.DataFrame(np.random.randn(366, 1), index=index, columns=["Returns"])
I know that Cumulative returns are computed as follows(with unit starting value)
start=1.0
df['Cumulative Returns']=start * (1 + df['Returns']).cumprod()
I need to compute cumulative returns based on boolean condition of another column.
df['bool']=0
df.iloc[0:5,2]=1
df.iloc[8:18,2]=1
Data looks as follows:-
Returns Cumulative Returns bool
2011-01-01 -0.180628 0.819372 1
2011-01-02 0.585284 1.298938 1
2011-01-03 0.032713 1.341430 1
2011-01-04 0.161464 1.558023 1
2011-01-05 1.741576 4.271438 1
2011-01-06 -1.893358 -3.815922 0
2011-01-07 0.015942 -3.876755 0
2011-01-08 -0.615686 -1.489891 0
2011-01-09 0.330300 -1.982002 1
2011-01-10 0.274620 -2.526298 1
2011-01-11 0.222498 -3.088395 1
2011-01-12 -0.131634 -2.681858 1
2011-01-13 -0.217193 -2.099378 1
2011-01-14 -0.794016 -0.432438 1
2011-01-15 0.077270 -0.465853 1
2011-01-16 0.388143 -0.646670 1
2011-01-17 0.361618 -0.880518 1
2011-01-18 -1.732723 0.645176 1
2011-01-19 -0.045690 0.615698 0
2011-01-20 1.018151 1.242571 0
2011-01-21 -0.218665 0.970865 0
2011-01-22 -1.454362 -0.441124 0
2011-01-23 1.401056 -1.059163 0
2011-01-24 0.233366 -1.306336 0
2011-01-25 -0.235055 -0.999275 0
2011-01-26 0.577812 -1.576668 0
2011-01-27 0.510124 -2.380965 0
2011-01-28 -0.848362 -0.361045 0
2011-01-29 0.712476 -0.618281 0
2011-01-30 -0.176403 -0.509214 0
I want to compute non continuous cumulative returns from 2011-01-01 to 2011-01-05 and again from 2011-01-09 to 2011-01-18 that is based on bool column.
Use your original formula, but only for rows with bool == 1 . To do it, instead of df use df[df['bool'] == 1] . So the whole instruction can be:
df['CumProd2'] = start * (1 + df[df['bool'] == 1].Returns).cumprod()
Values for bool == 0 are left as NaN . If you want to change them to eg 0 , run:
df.CumProd2.fillna(0, inplace=True)
You can identify the 1
blocks with cumsum
, then groupby().cumprod()
on that blocks:
df['CumProd'] = (df['Returns'].add(1).mul(df['bool'])
.groupby([df['bool'].ne(1).cumsum(), df['bool']]).cumprod()
)
Output:
Returns Cumulative Returns bool CumProd
2011-01-01 -0.180628 0.819372 1 0.819372
2011-01-02 0.585284 1.298938 1 1.298937
2011-01-03 0.032713 1.341430 1 1.341429
2011-01-04 0.161464 1.558023 1 1.558022
2011-01-05 1.741576 4.271438 1 4.271436
2011-01-06 -1.893358 -3.815922 0 -0.000000
2011-01-07 0.015942 -3.876755 0 0.000000
2011-01-08 -0.615686 -1.489891 0 0.000000
2011-01-09 0.330300 -1.982002 1 1.330300
2011-01-10 0.274620 -2.526298 1 1.695627
2011-01-11 0.222498 -3.088395 1 2.072901
2011-01-12 -0.131634 -2.681858 1 1.800036
2011-01-13 -0.217193 -2.099378 1 1.409081
2011-01-14 -0.794016 -0.432438 1 0.290248
2011-01-15 0.077270 -0.465853 1 0.312676
2011-01-16 0.388143 -0.646670 1 0.434038
2011-01-17 0.361618 -0.880518 1 0.590995
2011-01-18 -1.732723 0.645176 1 -0.433035
2011-01-19 -0.045690 0.615698 0 0.000000
2011-01-20 1.018151 1.242571 0 0.000000
2011-01-21 -0.218665 0.970865 0 0.000000
2011-01-22 -1.454362 -0.441124 0 -0.000000
2011-01-23 1.401056 -1.059163 0 0.000000
2011-01-24 0.233366 -1.306336 0 0.000000
2011-01-25 -0.235055 -0.999275 0 0.000000
2011-01-26 0.577812 -1.576668 0 0.000000
2011-01-27 0.510124 -2.380965 0 0.000000
2011-01-28 -0.848362 -0.361045 0 0.000000
2011-01-29 0.712476 -0.618281 0 0.000000
2011-01-30 -0.176403 -0.509214 0 0.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.