Say, I have a giant dataframe df
of N rows
and N could be 1 billion rows.
If I do
df.rolling(window=lookback).mean()
I will get the rolling mean (or any rolling operation) for every row (of course rows at beginning may be all N depending on lookback)
No problem for above, but it is very very slow because N is too big.
I only need to do the rolling for the last M rows
to save time and I only need the results from last M rows. and M << N
How can I achieve this? I don't want to writing my own rolling function and is there a way in Pandas or Numpy that I can tell just do some operations for M times and stop?
IIUC, you can slice then apply the rolling:
df = pd.DataFrame({'col': np.arange(1000)})
M = 10
N = 5
out = df.iloc[-M-N+1:].rolling(N).mean().iloc[N-1:]
To be compared with:
df.rolling(N).mean().iloc[-M:]
example output:
col
990 988.0
991 989.0
992 990.0
993 991.0
994 992.0
995 993.0
996 994.0
997 995.0
998 996.0
999 997.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.