Pandas 滚动意味着偏移（不连续可用）日期

Question

given the following example table给定以下示例表

Index指数	Date日期	Weekday工作日	Value价值
1 1个	05/12/2022 05/12/2022	2 2个	10 10
2 2个	06/12/2022 06/12/2022	3 3个	20 20
3 3个	07/12/2022 2022 年 7 月 12 日	4 4个	40 40
4 4个	09/12/2022 2022 年 9 月 12 日	6 6个	10 10
5 5个	10/12/2022 2022 年 10 月 12 日	7 7	60 60
6 6个	11/12/2022 11/12/2022	1 1个	30 30
7 7	12/12/2022 12/12/2022	2 2个	40 40
8 8个	13/12/2022 13/12/2022	3 3个	50 50
9 9	14/12/2022 14/12/2022	4 4个	60 60
10 10	16/12/2022 16/12/2022	6 6个	20 20
11 11	17/12/2022 17/12/2022	7 7	50 50
12 12	18/12/2022 18/12/2022	1 1个	10 10
13 13	20/12/2022 20/12/2022	3 3个	20 20
14 14	21/12/2022 21/12/2022	4 4个	10 10
15 15	22/12/2022 22/12/2022	5 5个	40 40

I want to calculate a rolling average of the last three observations (at least) a week ago.我想计算（至少）一周前最后三个观察值的滚动平均值。 I cannot use.shift as some dates are randomly missing, and.shift would therefore not produce a reliable output.我不能使用 .shift，因为某些日期随机丢失，因此 .shift 不会产生可靠的输出。

Desired output example for last three rows in the example dataset:示例数据集中最后三行的所需输出示例：

Index 13: Avg of indices 8, 7, 6 = (30+40+50) / 3 = 40

Index 14: Avg of indices 9, 8, 7 = (40+50+60) / 3 = 50

Index 15: Avg of indices 9, 8, 7 = (40+50+60) / 3 = 50

What would be a working solution for this?对此有什么可行的解决方案？ Thanks!谢谢！

Thanks!谢谢！

Answer 1

I apologize for this ugly code.我为这个丑陋的代码道歉。 But it seems to work:但它似乎有效：

df = df.set_index("Index")
df['Date'] = df['Date'].astype("datetime64")
for id in df.index:
    dfs = df.loc[:id]
    mean = dfs["Value"][dfs['Date'] <= dfs.iloc[-1]['Date'] - pd.Timedelta(1, "W")].tail(3).mean()
    print(id, mean)

Result:结果：

1 nan
2 10.0
3 15.0
4 23.333333333333332
5 23.333333333333332
6 36.666666666666664
7 33.333333333333336
8 33.333333333333336
9 33.333333333333336
10 33.333333333333336
11 33.333333333333336
12 33.333333333333336
13 40.0
14 50.0
15 50.0

Answer 2

MOSTLY inspired from @Aidis you could, make his solution an apply:大部分灵感来自@Aidis，您可以将他的解决方案应用：

df['mean']=df.apply(lambda y:  df["Value"][df['Date'] <= y['Date'] - pd.Timedelta(1, "W")].tail(3).mean(), axis=1)

or spliting the data at each call which may run faster if you have lots of data (to be tested):或在每次调用时拆分数据，如果您有大量数据（待测试），这可能会运行得更快：

df['mean']=df.apply(lambda y:  df.loc[:y.name, "Value"][ df.loc[:y.name,'Date'] <= y['Date'] - pd.Timedelta(1, "W")].tail(3).mean(), axis=1)

which returns:返回：

    Index       Date  Weekday  Value       mean
0       1 2022-12-05        2     10        NaN
1       2 2022-12-06        3     20        NaN
2       3 2022-12-07        4     40        NaN
3       4 2022-12-09        6     10        NaN
4       5 2022-12-10        7     60        NaN
5       6 2022-12-11        1     30        NaN
6       7 2022-12-12        2     40  10.000000
7       8 2022-12-13        3     50  15.000000
8       9 2022-12-14        4     60  23.333333
9      10 2022-12-16        6     20  23.333333
10     11 2022-12-17        7     50  36.666667
11     12 2022-12-18        1     10  33.333333
12     13 2022-12-20        3     20  40.000000
13     14 2022-12-21        4     10  50.000000
14     15 2022-12-22        5     40  50.000000

Answer 3

hello i hope that help you,您好，希望对您有所帮助，

a=df.groupby("Weekday")['Value'].mean() # calculate mean for evry week
last3=df.iloc[-3:]#get last three rows in  dataset
mean=pd.DataFrame()
mean['mean']=a
result=last3.merge(mean, left_on='Weekday', right_on='Weekday')#megre the tow result on one dataframe

Pandas 滚动意味着偏移（不连续可用）日期

问题描述

2 个解决方案

解决方案1
1 2022-12-23 15:59:14

解决方案2
1 2022-12-23 16:42:31

解决方案3
0 2022-12-23 16:15:01

Pandas 滚动意味着偏移（不连续可用）日期

问题描述

2 个解决方案

解决方案1 1 2022-12-23 15:59:14

解决方案2 1 2022-12-23 16:42:31

解决方案3 0 2022-12-23 16:15:01

解决方案1
1 2022-12-23 15:59:14

解决方案2
1 2022-12-23 16:42:31

解决方案3
0 2022-12-23 16:15:01