[英]Pandas rolling mean with offset by (not continuously available) date
given the following example table给定以下示例表
Index![]() |
Date![]() |
Weekday![]() |
Value![]() |
---|---|---|---|
1 ![]() |
05/12/2022 ![]() |
2 ![]() |
10 ![]() |
2 ![]() |
06/12/2022 ![]() |
3 ![]() |
20 ![]() |
3 ![]() |
07/12/2022 ![]() |
4 ![]() |
40 ![]() |
4 ![]() |
09/12/2022 ![]() |
6 ![]() |
10 ![]() |
5 ![]() |
10/12/2022 ![]() |
7 ![]() |
60 ![]() |
6 ![]() |
11/12/2022 ![]() |
1 ![]() |
30 ![]() |
7 ![]() |
12/12/2022 ![]() |
2 ![]() |
40 ![]() |
8 ![]() |
13/12/2022 ![]() |
3 ![]() |
50 ![]() |
9 ![]() |
14/12/2022 ![]() |
4 ![]() |
60 ![]() |
10 ![]() |
16/12/2022 ![]() |
6 ![]() |
20 ![]() |
11 ![]() |
17/12/2022 ![]() |
7 ![]() |
50 ![]() |
12 ![]() |
18/12/2022 ![]() |
1 ![]() |
10 ![]() |
13 ![]() |
20/12/2022 ![]() |
3 ![]() |
20 ![]() |
14 ![]() |
21/12/2022 ![]() |
4 ![]() |
10 ![]() |
15 ![]() |
22/12/2022 ![]() |
5 ![]() |
40 ![]() |
I want to calculate a rolling average of the last three observations (at least) a week ago.我想计算(至少)一周前最后三个观察值的滚动平均值。 I cannot use.shift as some dates are randomly missing, and.shift would therefore not produce a reliable output.
我不能使用 .shift,因为某些日期随机丢失,因此 .shift 不会产生可靠的输出。
Desired output example for last three rows in the example dataset:示例数据集中最后三行的所需输出示例:
Index 13: Avg of indices 8, 7, 6 = (30+40+50) / 3 = 40
Index 14: Avg of indices 9, 8, 7 = (40+50+60) / 3 = 50
Index 15: Avg of indices 9, 8, 7 = (40+50+60) / 3 = 50
What would be a working solution for this?对此有什么可行的解决方案? Thanks!
谢谢!
Thanks!谢谢!
I apologize for this ugly code.我为这个丑陋的代码道歉。 But it seems to work:
但它似乎有效:
df = df.set_index("Index")
df['Date'] = df['Date'].astype("datetime64")
for id in df.index:
dfs = df.loc[:id]
mean = dfs["Value"][dfs['Date'] <= dfs.iloc[-1]['Date'] - pd.Timedelta(1, "W")].tail(3).mean()
print(id, mean)
Result:结果:
1 nan
2 10.0
3 15.0
4 23.333333333333332
5 23.333333333333332
6 36.666666666666664
7 33.333333333333336
8 33.333333333333336
9 33.333333333333336
10 33.333333333333336
11 33.333333333333336
12 33.333333333333336
13 40.0
14 50.0
15 50.0
MOSTLY inspired from @Aidis you could, make his solution an apply:大部分灵感来自@Aidis,您可以将他的解决方案应用:
df['mean']=df.apply(lambda y: df["Value"][df['Date'] <= y['Date'] - pd.Timedelta(1, "W")].tail(3).mean(), axis=1)
or spliting the data at each call which may run faster if you have lots of data (to be tested):或在每次调用时拆分数据,如果您有大量数据(待测试),这可能会运行得更快:
df['mean']=df.apply(lambda y: df.loc[:y.name, "Value"][ df.loc[:y.name,'Date'] <= y['Date'] - pd.Timedelta(1, "W")].tail(3).mean(), axis=1)
which returns:返回:
Index Date Weekday Value mean
0 1 2022-12-05 2 10 NaN
1 2 2022-12-06 3 20 NaN
2 3 2022-12-07 4 40 NaN
3 4 2022-12-09 6 10 NaN
4 5 2022-12-10 7 60 NaN
5 6 2022-12-11 1 30 NaN
6 7 2022-12-12 2 40 10.000000
7 8 2022-12-13 3 50 15.000000
8 9 2022-12-14 4 60 23.333333
9 10 2022-12-16 6 20 23.333333
10 11 2022-12-17 7 50 36.666667
11 12 2022-12-18 1 10 33.333333
12 13 2022-12-20 3 20 40.000000
13 14 2022-12-21 4 10 50.000000
14 15 2022-12-22 5 40 50.000000
hello i hope that help you,您好,希望对您有所帮助,
a=df.groupby("Weekday")['Value'].mean() # calculate mean for evry week
last3=df.iloc[-3:]#get last three rows in dataset
mean=pd.DataFrame()
mean['mean']=a
result=last3.merge(mean, left_on='Weekday', right_on='Weekday')#megre the tow result on one dataframe
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.