pandas: calculate percentage change of timeseries from a specific date based on a condition

Question

I am struggling with the post efficient way to accomplish the following. I have a timeseries datframe. I based on some condition, i set a boolean column. After that i would like to generate another column that is the percentage from the last occurrence of the condition. So for example in the below table, Row's 2,4, and 5 are the percentage of the value from row 1. and rows 6,7 and 7 are the percentag change from row 5.

row date        value  condition    pct_change_from_condition
1   04-27-2010  100    TRUE             
2   04-28-2010  200                 1.0
4   04-29-2010  300                 2.0
5   04-30-2010  400    TRUE         3.0
6   05-01-2010  500                 0.25
7   05-02-2010  600                 0.5
8   05-03-2010  700                 0.75

I know i can iterate over the rows and do this..but since this is pandas, i would like a more 'pandemic' and efficient way of doing this...im just no sure how that might be done here. It feels like i need something like a conditional shift:

df['pct_change_from_condition'] = (df.value - df.shift(df.condition).value)/df.value

or maybe using loc:

df['pct_change_from_condition'] = df.value - df.loc[df.condition].value

Ofcourse these do not work, hence why i am here asking... Thanks for any help

Answer 1

You can try this:

import numpy as np
import pandas as pd

mask = (df['condition'] == True)
df['group'] = mask.cumsum()
df['first'] = df.groupby(['group'])['value'].transform('first')
df['first'] = np.where(mask, df['first'].shift(), df['first'])
df['pct_change'] = (df['value']-df['first'])/df['first']

# Out[52]: 
# 0     NaN
# 1    1.00
# 2    2.00
# 3    3.00
# 4    0.25
# 5    0.50
# 6    0.75
# Name: pct_change, dtype: float64

Answer 2

We can compare the condition column with TRUE to create a boolean mask then mask the values in the column value corresponding to this boolean mask followed by shift and ffill to create a series s , now subtract and divide value from/by s to calculate percent change

m = df['condition'].eq('TRUE')
s = df['value'].mask(~m).shift().ffill()
df['% change'] = df['value'].sub(s).div(s)

         date  value condition  % change
1  04-27-2010    100      TRUE       NaN
2  04-28-2010    200       NaN      1.00
4  04-29-2010    300       NaN      2.00
5  04-30-2010    400      TRUE      3.00
6  05-01-2010    500       NaN      0.25
7  05-02-2010    600       NaN      0.50
8  05-03-2010    700       NaN      0.75

Answer 3

One way to do this is by keeping a running sum of condition == True (shifted by 1), using .groupby() using that running sum, and for each group, calculate the percentages w.r.t the appropriate reference.

pandas: calculate percentage change of timeseries from a specific date based on a condition

Question

3 answers

solution1
4 2021-05-11 17:40:18

solution2
3 ACCPTED 2021-05-11 17:45:20

solution3
2 2021-05-11 17:33:18

pandas: calculate percentage change of timeseries from a specific date based on a condition

Question

3 answers

solution1 4 2021-05-11 17:40:18

solution2 3 ACCPTED 2021-05-11 17:45:20

solution3 2 2021-05-11 17:33:18

solution1
4 2021-05-11 17:40:18

solution2
3 ACCPTED 2021-05-11 17:45:20

solution3
2 2021-05-11 17:33:18