简体   繁体   中英

Efficient usage of pandas.DataFrame.rolling() with time-based window

I need to apply a function on a rolling window on some sparse datetime-indexed DataFrame (the time gap between rows is varying). The window size is specified by an offset:

def value_diff(x):
    return (x[-1] - x[0]) / x[0] * 100

diff = df['value'].rolling(window='10min').apply(value_diff)

I need the first value of result to be based on at least 10 minutes of data.

Unfortunately, the min_periods parameter of the rolling() function doesn't accept an offset, only numerical amount of points and I can't specify a fixed value for it because number of elements in one window varies.

After running this code I get a Series object which starts with results of applying value_diff() function from the very beginning of the DataFrame column, while the window contains only 1 element, then 2 elements, 3 elements and so on.

I can run diff = duff.truncate(before=diff.index[0] + timedelta(minutes=10), copy=False) , but if feels somewhat inefficient to me. Is there a way to avoid applying the rolling function to incomplete windows in the beginning, following with truncating unreliable data, without completely rewriting the rolling() ?

I think that you have to reconstruct the missing timestamp in order to apply a rolling function with a fixed length, above an example:

# build an irregular ts
series = pd.Series(np.ones(60))
series.index = pd.date_range(datetime(2010, 1, 1, 13, 0), periods=60, freq='1min')
series = series.sample(20, random_state=33).sort_index()

# reconstruct the series with every timestamp and apply a rolling function
series = series.reindex(pd.date_range(datetime(2010, 1, 1, 13, 0), periods=60, freq='1min'), fill_value=0)
series.rolling(10).sum() # 10 min function

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM