简体   繁体   中英

Pandas rolling but involves last rows value

I have this dataframe

hour = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
visitor = [4,6,2,4,3,7,5,7,8,3,2,8,3,6,4,5,1,8,9,4,2,3,4,1]
df = {"Hour":hour, "Total_Visitor":visitor}
df = pd.DataFrame(df)
print(df)

I applied 6 window rolling sum

df_roll = df.rolling(6, min_periods=6).sum()
print(df_roll)

The first 5 rows will give you NaN value, The problem is I want to know the sum of total visitor from 9pm to 3am, so I have to sum total visitor from hour 21 and then back to hour 0 until 3

How do you do that automatically with rolling?

I think you need add last N values, then using rolling and filter by length of Series :

N = 6
df_roll = df.iloc[-N:].append(df).rolling(N).sum().iloc[-len(df):]
print (df_roll)
     Hour  Total_Visitor
0   105.0           18.0
1    87.0           20.0
2    69.0           20.0
3    51.0           21.0
4    33.0           20.0
5    15.0           26.0
6    21.0           27.0
7    27.0           28.0
8    33.0           34.0
9    39.0           33.0
10   45.0           32.0
11   51.0           33.0
12   57.0           31.0
13   63.0           30.0
14   69.0           26.0
15   75.0           28.0
16   81.0           27.0
17   87.0           27.0
18   93.0           33.0
19   99.0           31.0
20  105.0           29.0
21  111.0           27.0
22  117.0           30.0
23  123.0           23.0

Check original solution:

df_roll = df.rolling(6, min_periods=6).sum()
print(df_roll)
     Hour  Total_Visitor
0     NaN            NaN
1     NaN            NaN
2     NaN            NaN
3     NaN            NaN
4     NaN            NaN
5    15.0           26.0
6    21.0           27.0
7    27.0           28.0
8    33.0           34.0
9    39.0           33.0
10   45.0           32.0
11   51.0           33.0
12   57.0           31.0
13   63.0           30.0
14   69.0           26.0
15   75.0           28.0
16   81.0           27.0
17   87.0           27.0
18   93.0           33.0
19   99.0           31.0
20  105.0           29.0
21  111.0           27.0
22  117.0           30.0
23  123.0           23.0

Numpy alternative with strides is complicated, but faster if large one Series :

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

N = 3
x = np.concatenate([fv[-N+1:], fv.to_numpy()])
cv = pd.Series(rolling_window(x, N).sum(axis=1), index=fv.index)
print (cv)
0    5
1    4
2    4
3    6
4    5
dtype: int64

Though you have mentioned a series, see if this is helpful-

import pandas as pd


def cyclic_roll(s, n):
    s = s.append(s[:n-1])
    result = s.rolling(n).sum()
    return result[-n+1:].append(result[n-1:-n+1])


fv = pd.DataFrame([1, 2, 3, 4, 5])
cv = fv.apply(cyclic_roll, n=3)
cv.reset_index(inplace=True, drop=True)
print cv
Output
 0 0 10.0 1 8.0 2 6.0 3 9.0 4 12.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM