How to reduce the runtime for pandas rolling taking too long run on multiple columns - pandas

Question

I am working on a timeseries data. I am trying to apply the percentage change to the data.

Here is a snapshot of the data:

Time                     EX  SC      WH      YE Lt   Ub     Yl_2    Wm      Wm_2    value
2016-02-15 11:54:00 UTC 4.4 0.14    8.38    755 232 0.009   0.11    1428    1020    FALSE
2016-02-15 11:55:00 UTC 4.4 0.14    8.38    755 232 0.009   0.111   1436    1018    FALSE
2016-02-15 11:56:00 UTC 4.4 0.14    8.38    755 232 0.014   0.113   1471    1019    FALSE
2016-02-15 11:57:00 UTC 4.4 0.14    8.37    755 232 0.015   0.111   1457    1015    FALSE
2016-02-15 11:58:00 UTC 4.4 0.14    8.38    755 232 0.013   0.111   1476    1019    FALSE
2016-02-15 11:59:00 UTC 4.4 0.14    8.36    755 232 0.013   0.114   1416    1015    FALSE

The shape of the data is (122334, 10)

Here is my function:

def percent_change(series):
    # Collect all *but* the last value of this window, then the final value
    previous_values = series[:-1]
    last_value = series[-1]

    # Calculate the % difference between the last value and the mean of earlier values
    percent_change = (last_value - np.mean(previous_values)) / np.mean(previous_values)
    return percent_change

Applying the function here:

df2 = df.rolling(10).apply(percent_change)

Takes forever, please what am I doing wrong? Or how should I do it instead?

Thanks

Answer 1

Here is an approach that uses shift() and rolling() to compute the mean efficiently:

import pandas as pd

def rolling_pct_change(df, field):
    t = df.copy()
    t['mean'] = t['x'].shift(1).rolling(3).mean()
    t['pct_change'] = ((t['x'] - t['mean']) / t['mean'])
    return t

df = pd.DataFrame({'x': [*range(10)]})
df2 = rolling_pct_change(df, 'x')
print(df2)

   x  mean  pct_change
0  0   NaN         NaN
1  1   NaN         NaN
2  2   NaN         NaN
3  3   1.0    2.000000
4  4   2.0    1.000000
5  5   3.0    0.666667
6  6   4.0    0.500000
7  7   5.0    0.400000
8  8   6.0    0.333333
9  9   7.0    0.285714

How to reduce the runtime for pandas rolling taking too long run on multiple columns - pandas

Question

1 answers

solution1
0 2020-09-08 15:12:48

How to reduce the runtime for pandas rolling taking too long run on multiple columns - pandas

Question

1 answers

solution1 0 2020-09-08 15:12:48

solution1
0 2020-09-08 15:12:48