简体   繁体   中英

rolling mean in Pandas with fixed time window (instead of fixed nb. of observations)

I have a dataframe with two columns and a 3 level index structure. Columns are Price and Volume, and the indexes are Trader - Stock - day.

I would like to compute the rolling mean of Price and Volume over the last 50 days for each Trader - Stock combination in my data.

This is what I came up with so far.

test=test.set_index(['date','trader', 'stock'])

test=test.unstack().unstack()

test=test.resample("1D")

test=test.fillna(0)

test[[col+'_norm' for col in test.columns]]=test.apply(lambda x: pd.rolling_mean(x,50,50))

test.stack().stack().reset_index().set_index(['trader', 'stock','date']).sort_index().head()

that Is, I unstack my dataset twice so that I only have the time axis left, and I can compute a 50 days rolling mean of my variables because 50 observations will correspond to 50 days (after having resampled the data).

The problem is that I dont know how to create the right names for my rolling mean variables

test[[col+'_norm' for col in test.columns]]

TypeError: can only concatenate tuple (not "str") to tuple

Any ideas what is wrong here? Is my algorithm actually correct to get these rolling means? Many thanks!

The result of pd.rolling_mean (with modified column names) can be concatenated with the original DataFrame:

means = pd.rolling_mean(test, 50, 50)
means.columns = [('{}_norm'.format(col[0]),)+col[1:] for col in means.columns]
test = pd.concat([test, means], axis=1)

import numpy as np
import pandas as pd

N = 10
test = pd.DataFrame(np.random.randint(4, size=(N, 3)),
                    columns=['trader', 'stock', 'foo'],
                    index=pd.date_range('2000-1-1', periods=N))
test.index.names = ['date']
test = test.set_index(['trader', 'stock'], append=True)

test = test.unstack().unstack()

test = test.resample("1D")

test = test.fillna(0)

means = pd.rolling_mean(test, 50, 50)
means.columns = [('{}_norm'.format(col[0]),)+col[1:] for col in means.columns]
test = pd.concat([test, means], axis=1)

test = test.stack().stack()
test = test.reorder_levels(['trader', 'stock', 'date'])
test = test.sort_index()
print(test.head())

yields

                         foo  foo_norm
trader stock date                     
0      0     2000-01-01    0       NaN
             2000-01-02    0       NaN
             2000-01-03    0       NaN
             2000-01-04    0       NaN
             2000-01-05    0       NaN
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM