I have a pandas dataframe timeseries (of about 1000 rows and the four columns below) that looks like this:
Date Values Avg +1 Stdev
01/01/2010 1.01 1.00 1.05
02/01/2010 1.02 1.00 1.05
03/01/2010 1.04 1.00 1.05
04/01/2010 -0.97 1.00 1.05
05/01/2010 1.12 1.00 1.05
06/01/2010 1.08 1.00 1.05
....
What I'm trying to do is create a fifth column (called 'Trigger Date'), where if the value in column 2 breaches the threshold set in column 4, then the new column returns the date (from the index column), otherwise no value is returned. The additional constraint here is that the fifth column should ALSO NOT return a date if the previous value already breached the threshold in column 4.
In other words, the psuedocode for the problem would be:
If df['Values'] > df['+1 Stdev']
AND
If df['Values'] (for the row above) < df['+1 Stdev']
THEN
Return df['Date'] in new column df['Trigger Date']
ELSE
Leave row in df['Trigger Date'] blank
Any help on how to tackle this would be greatly appreciated
EDIT: Additional question - any way to add a third constraint, where no trigger date is returned if one has already occurred in the past XX days (eg in the past 30 days)? So expected would look like:
Date Values Avg +1 Stdev Trigger Date
0 01/01/2010 1.01 1.0 1.05 NaN
1 02/01/2010 1.02 1.0 1.05 NaN
2 03/01/2010 1.04 1.0 1.05 NaN
3 04/01/2010 -0.97 1.0 1.05 NaN
4 05/01/2010 1.12 1.0 1.05 05/01/2010
5 06/01/2010 1.08 1.0 1.05 NaN
6 07/01/2010 1.03 1.0 1.05 NaN
7 08/01/2010 1.07 1.0 1.05 NaN <- above threshold, but trigger occurred within last 30 days so don't return date
...
50 20/02/2010 1.12 1.0 1.05 20/02/2010 <- more than 30 days later, no trigger dates in between, so return date
Use numpy.where
with shift
for values above row:
m1 = df['Values'] > df['+1 Stdev']
m2 = df['Values'].shift() < df['+1 Stdev']
df['Trigger Date'] = np.where(m1 & m2, df['Date'], np.nan)
print (df)
Date Values Avg +1 Stdev Trigger Date
0 01/01/2010 1.01 1.0 1.05 NaN
1 02/01/2010 1.02 1.0 1.05 NaN
2 03/01/2010 1.04 1.0 1.05 NaN
3 04/01/2010 -0.97 1.0 1.05 NaN
4 05/01/2010 1.12 1.0 1.05 05/01/2010
5 06/01/2010 1.08 1.0 1.05 NaN
EDIT:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
m1 = df['Values'] > df['+1 Stdev']
m2 = df['Values'].shift() < df['+1 Stdev']
a = df['Date'] - pd.Timedelta(30, unit='d')
L = [df['Date'].shift(-1).isin(pd.date_range(x, y, freq='d')) for x, y in zip(a, df['Date'] )]
m3 = np.logical_or.reduce(L)
mask = (m1 & m2) | ~m3
df.loc[mask, 'Trigger Date'] = df['Date']
print (df)
Date Values Avg +1 Stdev Trigger Date
0 2010-01-01 1.01 1.0 1.05 NaT
1 2010-01-02 1.02 1.0 1.05 NaT
2 2010-01-03 1.04 1.0 1.05 NaT
3 2010-01-04 -0.97 1.0 1.05 NaT
4 2010-01-05 1.12 1.0 1.05 2010-01-05
5 2010-01-06 1.08 1.0 1.05 NaT
6 2010-02-20 1.12 1.0 1.05 2010-02-20
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.