時間窗固定的熊貓滾動平均值（而不是固定的觀測值）

Question

我有一個具有兩列和3級索引結構的數據框。 列是價格和數量，索引是交易者-庫存-天。

我想為我的數據中的每個交易者-股票組合計算過去50天的價格和交易量的滾動平均值。

到目前為止，這是我想出的。

test = test.set_index（['date'，'trader'，'stock']）

test = test.unstack（）。unstack（）

test = test.resample（“ 1D”）

test = test.fillna（0）

test [[test.columns中col的col +'_ norm']] = test.apply（lambda x：pd.rolling_mean（x，50,50））

test.stack（）。stack（）。reset_index（）。set_index（['trader'，'stock'，'date']）。sort_index（）。head（）

就是說，我兩次解開數據集，以便只剩下時間軸，而且我可以計算變量的50天滾動平均值，因為50次觀察對應於50天（重新采樣數據后）。

問題是我不知道如何為滾動均值變量創建正確的名稱

test [[test.columns中col的col +'_ norm']]

TypeError：只能將元組（不是“ str”）連接到元組

有什么主意在這里嗎？ 我的算法實際上是否正確才能獲得這些滾動平均值？ 非常感謝！

Answer 1

可以將pd.rolling_mean （具有修改的列名）的結果與原始DataFrame連接起來：

means = pd.rolling_mean(test, 50, 50)
means.columns = [('{}_norm'.format(col[0]),)+col[1:] for col in means.columns]
test = pd.concat([test, means], axis=1)

import numpy as np
import pandas as pd

N = 10
test = pd.DataFrame(np.random.randint(4, size=(N, 3)),
                    columns=['trader', 'stock', 'foo'],
                    index=pd.date_range('2000-1-1', periods=N))
test.index.names = ['date']
test = test.set_index(['trader', 'stock'], append=True)

test = test.unstack().unstack()

test = test.resample("1D")

test = test.fillna(0)

means = pd.rolling_mean(test, 50, 50)
means.columns = [('{}_norm'.format(col[0]),)+col[1:] for col in means.columns]
test = pd.concat([test, means], axis=1)

test = test.stack().stack()
test = test.reorder_levels(['trader', 'stock', 'date'])
test = test.sort_index()
print(test.head())

產量

                         foo  foo_norm
trader stock date                     
0      0     2000-01-01    0       NaN
             2000-01-02    0       NaN
             2000-01-03    0       NaN
             2000-01-04    0       NaN
             2000-01-05    0       NaN
...

時間窗固定的熊貓滾動平均值（而不是固定的觀測值）

問題描述

1 個解決方案

解決方案1
1 已采納 2016-01-24 23:45:19

時間窗固定的熊貓滾動平均值（而不是固定的觀測值）

問題描述

1 個解決方案

解決方案1 1 已采納 2016-01-24 23:45:19

解決方案1
1 已采納 2016-01-24 23:45:19