简体   繁体   中英

How to let Pandas calculate only last M points for rolling mean?

Say, I have a giant dataframe df of N rows and N could be 1 billion rows.

If I do

df.rolling(window=lookback).mean()

I will get the rolling mean (or any rolling operation) for every row (of course rows at beginning may be all N depending on lookback)

No problem for above, but it is very very slow because N is too big.

I only need to do the rolling for the last M rows to save time and I only need the results from last M rows. and M << N

How can I achieve this? I don't want to writing my own rolling function and is there a way in Pandas or Numpy that I can tell just do some operations for M times and stop?

IIUC, you can slice then apply the rolling:

df = pd.DataFrame({'col': np.arange(1000)})
M = 10
N = 5

out = df.iloc[-M-N+1:].rolling(N).mean().iloc[N-1:]

To be compared with:

df.rolling(N).mean().iloc[-M:]

example output:

       col
990  988.0
991  989.0
992  990.0
993  991.0
994  992.0
995  993.0
996  994.0
997  995.0
998  996.0
999  997.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM