[英]Moving average based on dates and not rows
CargoTons DateOrigin DateDestination Origin Destination
0 72875.0 2020-01-01 2020-01-08 Snohvit Dragon
1 77126.0 2020-01-01 2020-01-16 Cameron (Liqu.) Grain
2 0 2020-01-02
3 67500.0 2020-01-03 2020-01-18 Sabine Pass South Hook
4 93843.0 2020-01-04 2020-01-23 Ras Laffan South Hook
5 76239.0 2020-01-05 2020-01-14 Yamal Grain
6 71749.0 2020-01-05 2020-01-23 Sabine Pass Dragon
7 75353.0 2020-01-06 2020-01-22 Sabine Pass South Hook
8 71749.0 2020-01-07 2020-01-21 Sabine Pass South Hook
9 0 2020-01-08
10 96925.0 2020-01-09 2020-01-25 Ras Laffan South Hook
11 65013.0 2020-01-10 2020-01-22 Snohvit Grain
12 76505.0 2020-01-10 2020-01-19 Yamal Dragon
13 0 2020-01-11
14 0 2020-01-12
15 0 2020-01-13
16 0 2020-01-14
17 0 2020-01-15
Above is a snapshot of the data available.以上是可用数据的快照。
I would like to have a moving average column which give the MA based on dates and not rows-ie the days where I have multiple entries for the same date should just have one value as the MA.我想要一个移动平均列,它根据日期而不是行给出 MA——即我在同一日期有多个条目的日子应该只有一个值作为 MA。
I tried using pd.rolling() but this obviously gives the lookback on the rows rather than dates我尝试使用 pd.rolling() 但这显然会回顾行而不是日期
We don't know what your window is for the moving average, so I selected 2, which will leave the first day's MA value as NaN
obviously.我们不知道您的 window 对于移动平均线是什么,所以我选择了 2,这将使第一天的 MA 值明显保留为NaN
。
The basic logic is groupby date, sum the cargo tons, and do the MA on that with a 2 day period.基本逻辑是按日期分组,将货物吨数相加,然后在 2 天的时间内对其进行 MA。 Use a left join to introduce that back to the original dataframe.使用左连接将其引入原始 dataframe。
import pandas as pd
pd.merge(df,
df.groupby('DateOrigin')['CargoTons'].sum().rolling(2).mean().reset_index(name='Cargo MA'),
on='DateOrigin',
how='left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.