简体   繁体   中英

Time weighted moving average in pandas

I would like to perform a time-weighted moving average in pandas that weights proportionately to how recent the observations were.

Here is some sample data that I have.

dates = ['01/01/2021','02/01/2021','03/01/2021','04/01/2021','05/01/2021','06/01/2021']
swimmer1_place = ['1','1','4','3',np.nan,np.nan,]
swimmer2_place = [np.nan,'3','1',np.nan,'4','2']
swimmer3_place = ['2','2','3',np.nan,'3','1']

df = pd.DataFrame({'date':dates,'swimmer_1_place':swimmer1_place,'swimmer_2_place':swimmer2_place,'swimmer_3_place':swimmer3_place})
df['date'] = pd.to_datetime(df['date'])

在此处输入图像描述

What would be the best way to go about this? I have tried using the built-in Pandas EWM method with limited success because that doesn't consider the varying time intervals between the different swimmers.

After converting your col values into integers:

def convert(x):
    if x is np.nan:
        return np.nan
    else: 
        return (int(x))
swimmer1_place=[convert(a) for a in swimmer1_place]
swimmer2_place=[convert(a) for a in swimmer2_place]
swimmer3_place=[convert(a) for a in swimmer3_place]

You can get a weighted average like the following (Of course, there might be other possible ways to do this):

for column in df.columns:
    if column!="date":
        Array=(base_time-df['date'])[df[column].notna()].values
        SUM=np.sum(Array)
        AVG_array=Array/SUM
        column_vals=df[column][df[column].notna()].values
        result=AVG_array*column_vals
        df[column].loc[df[column].notna()]=result

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM