简体   繁体   中英

rolling mean with a moving window

My dataframe has a daily price column and a window size column :

df = pd.DataFrame(columns = ['price', 'window'],
             data = [[100, 1],[120, 2], [115, 2], [116, 2], [100, 4]])

df

        price   window
0        100    1
1        120    2
2        115    2
3        116    2
4        100    4

I would like to compute the rolling mean of price for each row using the window of the window column.

The result would be this :

df
    price   window  rolling_mean_price
0   100        1    100.00
1   120        2    110.00
2   115        2    117.50
3   116        2    115.50
4   100        4    112.75

I don't find any elegant way to do it with apply and I refuse to loop over each row of my DataFrame...

The best solutions, in terms of raw speed and complexity, are based on ideas from summed-area table . The problem can be consider as a table of one dimension. Below you can find several approaches, ranked from best to worst.

Numpy + Linear complexity

size = len(df['price'])
price = np.zeros(size + 1)
price[1:] = df['price'].values.cumsum()

window = np.clip(np.arange(size) - (df['window'].values - 1), 0, None)
df['rolling_mean_price'] = (price[1:] - price[window]) / df['window'].values

print(df)

Output

   price  window  rolling_mean_price
0    100       1              100.00
1    120       2              110.00
2    115       2              117.50
3    116       2              115.50
4    100       4              112.75

Loopy + Linear complexity

price = df['price'].values.cumsum()
df['rolling_mean_price'] = [(price[i] - float((i - w) > -1) * price[i-w]) / w for i, w in enumerate(df['window'])]

Loopy + Quadratic complexity

price = df['price'].values
df['rolling_mean_price'] = [price[i - (w - 1):i + 1].mean() for i, w in enumerate(df['window'])]

I would not recommend this approach using pandas.DataFrame.apply() (reasons described here ), but if you insist on it, here is one solution:

df['rolling_mean_price'] = df.apply(
    lambda row: df.rolling(row.window).price.mean().iloc[row.name], axis=1)

The output looks like this:

>>> print(df)
   price  window  rolling_mean_price
0    100       1              100.00
1    120       2              110.00
2    115       2              117.50
3    116       2              115.50
4    100       4              112.75

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM