I need to apply a function on a rolling window on some sparse datetime-indexed DataFrame (the time gap between rows is varying). The window size is specified by an offset:
def value_diff(x):
return (x[-1] - x[0]) / x[0] * 100
diff = df['value'].rolling(window='10min').apply(value_diff)
I need the first value of result to be based on at least 10 minutes of data.
Unfortunately, the min_periods parameter of the rolling()
function doesn't accept an offset, only numerical amount of points and I can't specify a fixed value for it because number of elements in one window varies.
After running this code I get a Series object which starts with results of applying value_diff()
function from the very beginning of the DataFrame column, while the window contains only 1 element, then 2 elements, 3 elements and so on.
I can run diff = duff.truncate(before=diff.index[0] + timedelta(minutes=10), copy=False)
, but if feels somewhat inefficient to me. Is there a way to avoid applying the rolling function to incomplete windows in the beginning, following with truncating unreliable data, without completely rewriting the rolling()
?
I think that you have to reconstruct the missing timestamp in order to apply a rolling function with a fixed length, above an example:
# build an irregular ts
series = pd.Series(np.ones(60))
series.index = pd.date_range(datetime(2010, 1, 1, 13, 0), periods=60, freq='1min')
series = series.sample(20, random_state=33).sort_index()
# reconstruct the series with every timestamp and apply a rolling function
series = series.reindex(pd.date_range(datetime(2010, 1, 1, 13, 0), periods=60, freq='1min'), fill_value=0)
series.rolling(10).sum() # 10 min function
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.