How to let Pandas calculate only last M points for rolling mean?

Question

Say, I have a giant dataframe df of N rows and N could be 1 billion rows.

If I do

df.rolling(window=lookback).mean()

I will get the rolling mean (or any rolling operation) for every row (of course rows at beginning may be all N depending on lookback)

No problem for above, but it is very very slow because N is too big.

I only need to do the rolling for the last M rows to save time and I only need the results from last M rows. and M << N

How can I achieve this? I don't want to writing my own rolling function and is there a way in Pandas or Numpy that I can tell just do some operations for M times and stop?

Answer 1

IIUC, you can slice then apply the rolling:

df = pd.DataFrame({'col': np.arange(1000)})
M = 10
N = 5

out = df.iloc[-M-N+1:].rolling(N).mean().iloc[N-1:]

To be compared with:

df.rolling(N).mean().iloc[-M:]

example output:

How to let Pandas calculate only last M points for rolling mean?

Question

1 answers

solution1
1 2022-09-21 08:57:15

How to let Pandas calculate only last M points for rolling mean?

Question

1 answers

solution1 1 2022-09-21 08:57:15

solution1
1 2022-09-21 08:57:15