时间窗固定的熊猫滚动平均值（而不是固定的观测值）

Question

I have a dataframe with two columns and a 3 level index structure. 我有一个具有两列和3级索引结构的数据框。 Columns are Price and Volume, and the indexes are Trader - Stock - day. 列是价格和数量，索引是交易者-库存-天。

I would like to compute the rolling mean of Price and Volume over the last 50 days for each Trader - Stock combination in my data. 我想为我的数据中的每个交易者-股票组合计算过去50天的价格和交易量的滚动平均值。

This is what I came up with so far. 到目前为止，这是我想出的。

test=test.set_index(['date','trader', 'stock']) test = test.set_index（['date'，'trader'，'stock']）

test=test.unstack().unstack() test = test.unstack（）。unstack（）

test=test.resample("1D") test = test.resample（“ 1D”）

test=test.fillna(0) test = test.fillna（0）

test[[col+'_norm' for col in test.columns]]=test.apply(lambda x: pd.rolling_mean(x,50,50)) test [[test.columns中col的col +'_ norm']] = test.apply（lambda x：pd.rolling_mean（x，50,50））

test.stack().stack().reset_index().set_index(['trader', 'stock','date']).sort_index().head() test.stack（）。stack（）。reset_index（）。set_index（['trader'，'stock'，'date']）。sort_index（）。head（）

that Is, I unstack my dataset twice so that I only have the time axis left, and I can compute a 50 days rolling mean of my variables because 50 observations will correspond to 50 days (after having resampled the data). 就是说，我两次解开数据集，以便只剩下时间轴，而且我可以计算变量的50天滚动平均值，因为50次观察对应于50天（重新采样数据后）。

The problem is that I dont know how to create the right names for my rolling mean variables 问题是我不知道如何为滚动均值变量创建正确的名称

test[[col+'_norm' for col in test.columns]] test [[test.columns中col的col +'_ norm']]

TypeError: can only concatenate tuple (not "str") to tuple TypeError：只能将元组（不是“ str”）连接到元组

Any ideas what is wrong here? 有什么主意在这里吗？ Is my algorithm actually correct to get these rolling means? 我的算法实际上是否正确才能获得这些滚动平均值？ Many thanks! 非常感谢！

Answer 1

The result of pd.rolling_mean (with modified column names) can be concatenated with the original DataFrame: 可以将pd.rolling_mean （具有修改的列名）的结果与原始DataFrame连接起来：

means = pd.rolling_mean(test, 50, 50)
means.columns = [('{}_norm'.format(col[0]),)+col[1:] for col in means.columns]
test = pd.concat([test, means], axis=1)

import numpy as np
import pandas as pd

N = 10
test = pd.DataFrame(np.random.randint(4, size=(N, 3)),
                    columns=['trader', 'stock', 'foo'],
                    index=pd.date_range('2000-1-1', periods=N))
test.index.names = ['date']
test = test.set_index(['trader', 'stock'], append=True)

test = test.unstack().unstack()

test = test.resample("1D")

test = test.fillna(0)

means = pd.rolling_mean(test, 50, 50)
means.columns = [('{}_norm'.format(col[0]),)+col[1:] for col in means.columns]
test = pd.concat([test, means], axis=1)

test = test.stack().stack()
test = test.reorder_levels(['trader', 'stock', 'date'])
test = test.sort_index()
print(test.head())

yields 产量

                         foo  foo_norm
trader stock date                     
0      0     2000-01-01    0       NaN
             2000-01-02    0       NaN
             2000-01-03    0       NaN
             2000-01-04    0       NaN
             2000-01-05    0       NaN
...

时间窗固定的熊猫滚动平均值（而不是固定的观测值）

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-01-24 23:45:19

时间窗固定的熊猫滚动平均值（而不是固定的观测值）

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-01-24 23:45:19

解决方案1
1 已采纳 2016-01-24 23:45:19