簡體   English   中英

熊貓數據框,為每一組添加一列作為另一列的移動平均值

[英]pandas dataframe, add one column as moving average of another column for each group

我有一個像下面的數據框df

dates = pd.date_range('2000-01-01', '2001-01-01')
df1 = pd.DataFrame({'date':dates, 'value':np.random.normal(size = len(dates)), 'market':'GOLD'})
df2 = pd.DataFrame({'date':dates, 'value':np.random.normal(size = len(dates)), 'market':'SILVER'})
df = pd.concat([df1, df2])
df = df.sort('date')

          date  market     value
0   2000-01-01    GOLD -1.361360
0   2000-01-01  SILVER  0.255830
1   2000-01-02  SILVER  0.196953
1   2000-01-02    GOLD  1.422454
2   2000-01-03    GOLD -0.827672
...

我想為每個市場添加另一列作為價值的10d移動平均值。

是否有一個簡單的df.groupby('market').??? 可以做到這一點? 還是我必須將桌子擺成更寬的形狀,使每一列光滑,然后融化?

您可以使用groupby/rolling/mean

result = (df.set_index('date')
            .groupby('market')['value']
            .rolling(10).mean()
            .unstack('market'))

產量

market          GOLD    SILVER
date                          
2000-01-01       NaN       NaN
2000-01-02       NaN       NaN
2000-01-03       NaN       NaN
2000-01-04       NaN       NaN
2000-01-05       NaN       NaN
2000-01-06       NaN       NaN
2000-01-07       NaN       NaN
2000-01-08       NaN       NaN
2000-01-09       NaN       NaN
2000-01-10  0.310077  0.582063
2000-01-11  0.312008  0.752218
2000-01-12  0.151159  0.877230
2000-01-13  0.213611  0.742156
2000-01-14  0.440113  0.614720
2000-01-15  0.551360  0.649967
...

這基於@unutbu的答案,並將結果作為新列添加回原始數據框。

result = df.set_index('date').groupby('market')['value'].rolling(10).mean()

現在,如果df被分類market 升后date ,結果應該是同步的,我們可以只分配到的值

df.sort_values(['market','date'], inplace = True)
df['value10d_1'] = result.values

但是,如果您像我一樣偏執, merge應該讓您高枕無憂,

df = pd.merge(df, result.reset_index().rename(columns = {'value':'value10d_2'}), on = ['market','date'])

df['value10d_1'] - df['value10d_2'] # all 0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM