简体   繁体   中英

Creating a rolling mean of an annual cycle in pandas, python

I am trying to use pandas to create a rolling mean, but of an annual cycle (so that the rolling mean for 31st December would take into account values from January, and the rolling means for January would use values for December). Does anyone know if there is a built in or other elegant way to do this?

The only way I've come up with so far is to create the annual cycle and then repeat it over leap years (as the annual cycle includes the 29th Feb), take the rolling mean (or standard deviation, etc) and then crop the middle year. There must be a better solution! Here's my attempt:

import pandas as pd
import numpy as np
import calendar

data = np.random.rand(366)
df_annual_cycle = pd.DataFrame(
    columns=['annual_cycle'],
    index=pd.date_range('2004-01-01','2004-12-31').strftime('%m-%d'),
    data=data
)

df_annual_cycle.head()

#        annual_cycle
# 01-01      0.863838
# 01-02      0.234168
# 01-03      0.368678
# 01-04      0.066332
# 01-05      0.493080

df1 = df_annual_cycle.copy()
df1.index = ['04-'+x for x in df1.index]
df1.index = pd.to_datetime(df1.index,format='%y-%m-%d')
df2 = df.copy()
df2.index = ['08-'+x for x in df2.index]
df2.index = pd.to_datetime(df2.index,format='%y-%m-%d')
df3 = df.copy()
df3.index = ['12-'+x for x in df3.index]
df3.index = pd.to_datetime(df3.index,format='%y-%m-%d')

df_for_rolling = df1.append(df2).append(df3)
df_rolling = df_for_rolling.rolling(65).mean()
df_annual_cycle_rolling = df_rolling.loc['2008-01-01':'2008-12-31']
df_annual_cycle_rolling.index = df_annual_cycle.index

We can use pandas.DataFrame.rolling() . Details and other rolling methods can be found here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html

Let's assume we have a dataframe like so:

data = np.concatenate([
    1*np.random.rand(366//6), 
    2*np.random.rand(366//6), 
    3*np.random.rand(366//6), 
    4*np.random.rand(366//6), 
    5*np.random.rand(366//6), 
    6*np.random.rand(366//6)
])

df_annual_cycle = pd.DataFrame(
    columns=['annual_cycle'],
    index=pd.date_range('2004-01-01','2004-12-31').strftime('%m-%d'),
    data=data,
)

We can do:

# reset the index to integers:
df_annual_cycle = df_annual_cycle.reset_index()

# rename index column to date:
df_annual_cycle = df_annual_cycle.rename(columns={'index':'date'})

# calculate the rolling mean:
df_annual_cycle['rolling_mean'] = df_annual_cycle['annual_cycle'].rolling(32, win_type='triang').mean()

# plot results
df_annual_cycle.plot(x='date', y=['annual_cycle', 'rolling_mean'], style=['o', '-'])

The result looks like this:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM