简体   繁体   English

时间窗固定的熊猫滚动平均值(而不是固定的观测值)

[英]rolling mean in Pandas with fixed time window (instead of fixed nb. of observations)

I have a dataframe with two columns and a 3 level index structure. 我有一个具有两列和3级索引结构的数据框。 Columns are Price and Volume, and the indexes are Trader - Stock - day. 列是价格和数量,索引是交易者-库存-天。

I would like to compute the rolling mean of Price and Volume over the last 50 days for each Trader - Stock combination in my data. 我想为我的数据中的每个交易者-股票组合计算过去50天的价格和交易量的滚动平均值。

This is what I came up with so far. 到目前为止,这是我想出的。

test=test.set_index(['date','trader', 'stock']) test = test.set_index(['date','trader','stock'])

test=test.unstack().unstack() test = test.unstack()。unstack()

test=test.resample("1D") test = test.resample(“ 1D”)

test=test.fillna(0) test = test.fillna(0)

test[[col+'_norm' for col in test.columns]]=test.apply(lambda x: pd.rolling_mean(x,50,50)) test [[test.columns中col的col +'_ norm']] = test.apply(lambda x:pd.rolling_mean(x,50,50))

test.stack().stack().reset_index().set_index(['trader', 'stock','date']).sort_index().head() test.stack()。stack()。reset_index()。set_index(['trader','stock','date'])。sort_index()。head()

that Is, I unstack my dataset twice so that I only have the time axis left, and I can compute a 50 days rolling mean of my variables because 50 observations will correspond to 50 days (after having resampled the data). 就是说,我两次解开数据集,以便只剩下时间轴,而且我可以计算变量的50天滚动平均值,因为50次观察对应于50天(重新采样数据后)。

The problem is that I dont know how to create the right names for my rolling mean variables 问题是我不知道如何为滚动均值变量创建正确的名称

test[[col+'_norm' for col in test.columns]] test [[test.columns中col的col +'_ norm']]

TypeError: can only concatenate tuple (not "str") to tuple TypeError:只能将元组(不是“ str”)连接到元组

Any ideas what is wrong here? 有什么主意在这里吗? Is my algorithm actually correct to get these rolling means? 我的算法实际上是否正确才能获得这些滚动平均值? Many thanks! 非常感谢!

The result of pd.rolling_mean (with modified column names) can be concatenated with the original DataFrame: 可以将pd.rolling_mean (具有修改的列名)的结果与原始DataFrame连接起来:

means = pd.rolling_mean(test, 50, 50)
means.columns = [('{}_norm'.format(col[0]),)+col[1:] for col in means.columns]
test = pd.concat([test, means], axis=1)

import numpy as np
import pandas as pd

N = 10
test = pd.DataFrame(np.random.randint(4, size=(N, 3)),
                    columns=['trader', 'stock', 'foo'],
                    index=pd.date_range('2000-1-1', periods=N))
test.index.names = ['date']
test = test.set_index(['trader', 'stock'], append=True)

test = test.unstack().unstack()

test = test.resample("1D")

test = test.fillna(0)

means = pd.rolling_mean(test, 50, 50)
means.columns = [('{}_norm'.format(col[0]),)+col[1:] for col in means.columns]
test = pd.concat([test, means], axis=1)

test = test.stack().stack()
test = test.reorder_levels(['trader', 'stock', 'date'])
test = test.sort_index()
print(test.head())

yields 产量

                         foo  foo_norm
trader stock date                     
0      0     2000-01-01    0       NaN
             2000-01-02    0       NaN
             2000-01-03    0       NaN
             2000-01-04    0       NaN
             2000-01-05    0       NaN
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM