简体   繁体   中英

pandas: calculate percentage change of timeseries from a specific date based on a condition

I am struggling with the post efficient way to accomplish the following. I have a timeseries datframe. I based on some condition, i set a boolean column. After that i would like to generate another column that is the percentage from the last occurrence of the condition. So for example in the below table, Row's 2,4, and 5 are the percentage of the value from row 1. and rows 6,7 and 7 are the percentag change from row 5.

row date        value  condition    pct_change_from_condition
1   04-27-2010  100    TRUE             
2   04-28-2010  200                 1.0
4   04-29-2010  300                 2.0
5   04-30-2010  400    TRUE         3.0
6   05-01-2010  500                 0.25
7   05-02-2010  600                 0.5
8   05-03-2010  700                 0.75

I know i can iterate over the rows and do this..but since this is pandas, i would like a more 'pandemic' and efficient way of doing this...im just no sure how that might be done here. It feels like i need something like a conditional shift:

df['pct_change_from_condition'] = (df.value - df.shift(df.condition).value)/df.value

or maybe using loc:

df['pct_change_from_condition'] = df.value - df.loc[df.condition].value 

Ofcourse these do not work, hence why i am here asking... Thanks for any help

You can try this:

import numpy as np
import pandas as pd

mask = (df['condition'] == True)
df['group'] = mask.cumsum()
df['first'] = df.groupby(['group'])['value'].transform('first')
df['first'] = np.where(mask, df['first'].shift(), df['first'])
df['pct_change'] = (df['value']-df['first'])/df['first']

# Out[52]: 
# 0     NaN
# 1    1.00
# 2    2.00
# 3    3.00
# 4    0.25
# 5    0.50
# 6    0.75
# Name: pct_change, dtype: float64

We can compare the condition column with TRUE to create a boolean mask then mask the values in the column value corresponding to this boolean mask followed by shift and ffill to create a series s , now subtract and divide value from/by s to calculate percent change

m = df['condition'].eq('TRUE')
s = df['value'].mask(~m).shift().ffill()
df['% change'] = df['value'].sub(s).div(s)

         date  value condition  % change
1  04-27-2010    100      TRUE       NaN
2  04-28-2010    200       NaN      1.00
4  04-29-2010    300       NaN      2.00
5  04-30-2010    400      TRUE      3.00
6  05-01-2010    500       NaN      0.25
7  05-02-2010    600       NaN      0.50
8  05-03-2010    700       NaN      0.75

One way to do this is by keeping a running sum of condition == True (shifted by 1), using .groupby() using that running sum, and for each group, calculate the percentages w.r.t the appropriate reference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM