I am struggling with the post efficient way to accomplish the following. I have a timeseries datframe. I based on some condition, i set a boolean column. After that i would like to generate another column that is the percentage from the last occurrence of the condition. So for example in the below table, Row's 2,4, and 5 are the percentage of the value from row 1. and rows 6,7 and 7 are the percentag change from row 5.
row date value condition pct_change_from_condition
1 04-27-2010 100 TRUE
2 04-28-2010 200 1.0
4 04-29-2010 300 2.0
5 04-30-2010 400 TRUE 3.0
6 05-01-2010 500 0.25
7 05-02-2010 600 0.5
8 05-03-2010 700 0.75
I know i can iterate over the rows and do this..but since this is pandas, i would like a more 'pandemic' and efficient way of doing this...im just no sure how that might be done here. It feels like i need something like a conditional shift:
df['pct_change_from_condition'] = (df.value - df.shift(df.condition).value)/df.value
or maybe using loc:
df['pct_change_from_condition'] = df.value - df.loc[df.condition].value
Ofcourse these do not work, hence why i am here asking... Thanks for any help
You can try this:
import numpy as np
import pandas as pd
mask = (df['condition'] == True)
df['group'] = mask.cumsum()
df['first'] = df.groupby(['group'])['value'].transform('first')
df['first'] = np.where(mask, df['first'].shift(), df['first'])
df['pct_change'] = (df['value']-df['first'])/df['first']
# Out[52]:
# 0 NaN
# 1 1.00
# 2 2.00
# 3 3.00
# 4 0.25
# 5 0.50
# 6 0.75
# Name: pct_change, dtype: float64
We can compare the condition
column with TRUE
to create a boolean mask then mask
the values in the column value
corresponding to this boolean mask followed by shift
and ffill
to create a series s
, now subtract and divide value
from/by s
to calculate percent change
m = df['condition'].eq('TRUE')
s = df['value'].mask(~m).shift().ffill()
df['% change'] = df['value'].sub(s).div(s)
date value condition % change
1 04-27-2010 100 TRUE NaN
2 04-28-2010 200 NaN 1.00
4 04-29-2010 300 NaN 2.00
5 04-30-2010 400 TRUE 3.00
6 05-01-2010 500 NaN 0.25
7 05-02-2010 600 NaN 0.50
8 05-03-2010 700 NaN 0.75
One way to do this is by keeping a running sum of condition == True
(shifted by 1), using .groupby()
using that running sum, and for each group, calculate the percentages w.r.t the appropriate reference.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.