I would like to perform a time-weighted moving average in pandas that weights proportionately to how recent the observations were.
Here is some sample data that I have.
dates = ['01/01/2021','02/01/2021','03/01/2021','04/01/2021','05/01/2021','06/01/2021']
swimmer1_place = ['1','1','4','3',np.nan,np.nan,]
swimmer2_place = [np.nan,'3','1',np.nan,'4','2']
swimmer3_place = ['2','2','3',np.nan,'3','1']
df = pd.DataFrame({'date':dates,'swimmer_1_place':swimmer1_place,'swimmer_2_place':swimmer2_place,'swimmer_3_place':swimmer3_place})
df['date'] = pd.to_datetime(df['date'])
What would be the best way to go about this? I have tried using the built-in Pandas EWM method with limited success because that doesn't consider the varying time intervals between the different swimmers.
After converting your col values into integers:
def convert(x):
if x is np.nan:
return np.nan
else:
return (int(x))
swimmer1_place=[convert(a) for a in swimmer1_place]
swimmer2_place=[convert(a) for a in swimmer2_place]
swimmer3_place=[convert(a) for a in swimmer3_place]
You can get a weighted average like the following (Of course, there might be other possible ways to do this):
for column in df.columns:
if column!="date":
Array=(base_time-df['date'])[df[column].notna()].values
SUM=np.sum(Array)
AVG_array=Array/SUM
column_vals=df[column][df[column].notna()].values
result=AVG_array*column_vals
df[column].loc[df[column].notna()]=result
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.