I am working on a timeseries
data. I am trying to apply the percentage change to the data.
Here is a snapshot of the data:
Time EX SC WH YE Lt Ub Yl_2 Wm Wm_2 value
2016-02-15 11:54:00 UTC 4.4 0.14 8.38 755 232 0.009 0.11 1428 1020 FALSE
2016-02-15 11:55:00 UTC 4.4 0.14 8.38 755 232 0.009 0.111 1436 1018 FALSE
2016-02-15 11:56:00 UTC 4.4 0.14 8.38 755 232 0.014 0.113 1471 1019 FALSE
2016-02-15 11:57:00 UTC 4.4 0.14 8.37 755 232 0.015 0.111 1457 1015 FALSE
2016-02-15 11:58:00 UTC 4.4 0.14 8.38 755 232 0.013 0.111 1476 1019 FALSE
2016-02-15 11:59:00 UTC 4.4 0.14 8.36 755 232 0.013 0.114 1416 1015 FALSE
The shape of the data is (122334, 10)
Here is my function:
def percent_change(series):
# Collect all *but* the last value of this window, then the final value
previous_values = series[:-1]
last_value = series[-1]
# Calculate the % difference between the last value and the mean of earlier values
percent_change = (last_value - np.mean(previous_values)) / np.mean(previous_values)
return percent_change
Applying the function here:
df2 = df.rolling(10).apply(percent_change)
Takes forever, please what am I doing wrong? Or how should I do it instead?
Thanks
Here is an approach that uses shift()
and rolling()
to compute the mean efficiently:
import pandas as pd
def rolling_pct_change(df, field):
t = df.copy()
t['mean'] = t['x'].shift(1).rolling(3).mean()
t['pct_change'] = ((t['x'] - t['mean']) / t['mean'])
return t
df = pd.DataFrame({'x': [*range(10)]})
df2 = rolling_pct_change(df, 'x')
print(df2)
x mean pct_change
0 0 NaN NaN
1 1 NaN NaN
2 2 NaN NaN
3 3 1.0 2.000000
4 4 2.0 1.000000
5 5 3.0 0.666667
6 6 4.0 0.500000
7 7 5.0 0.400000
8 8 6.0 0.333333
9 9 7.0 0.285714
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.