[英]rolling 30 days in pandas
I have a dataset:我有一个数据集:
import pandas as pd
df = pd.DataFrame({
'ID': ['27459', '27459', '27459', '27459', '27459', '27459', '27459', '48002', '48002', '48002'],
'Invoice_Date': ['2020-06-26', '2020-06-29', '2020-06-30', '2020-07-14', '2020-07-25',
'2020-07-30', '2020-08-02', '2020-05-13', '2020-06-20', '2020-06-28'],
'Difference_Date': [0,3,1,14,11,5,3,0,38,8],
})
df
I need to add another column that is the average of rolling 30 days period.我需要添加另一列,即滚动 30 天的平均值。 I tried using
rolling
but it gives me error window must be an integer
.我尝试使用
rolling
,但它给了我错误window must be an integer
。 Since this is customer-based data, it need to be groupby ID
as well.由于这是基于客户的数据,因此也需要 groupby
ID
。
My expected output is:我预期的 output 是:
ID Invoice_Date Difference_Date Average
0 27459 2020-06-26 0 0.00
1 27459 2020-06-29 3 1.50
2 27459 2020-06-30 1 1.33
3 27459 2020-07-14 14 4.50
4 27459 2020-07-25 11 5.80
5 27459 2020-07-30 5 10.00
6 27459 2020-08-02 3 8.25
7 48002 2020-05-13 0 0.00
8 48002 2020-06-20 38 38.00
9 48002 2020-06-28 8 23.00
Is there any efficient workaround for calculating average of rolling 30 days?是否有任何有效的解决方法来计算滚动 30 天的平均值?
This is because pandas needs a DatetimeIndex to do df.rolling('30D')
:这是因为 pandas 需要DatetimeIndex来执行
df.rolling('30D')
:
import pandas as pd
df = pd.DataFrame({
'ID': ['27459', '27459', '27459', '27459', '27459', '27459', '27459', '48002', '48002', '48002'],
'Invoice_Date': ['2020-06-26', '2020-06-29', '2020-06-30', '2020-07-14', '2020-07-25',
'2020-07-30', '2020-08-02', '2020-05-13', '2020-06-20', '2020-06-28'],
'Difference_Date': [0,3,1,14,11,5,3,0,38,8],
})
df.index = pd.DatetimeIndex(df['Invoice_Date'])
df = df.sort_index()
df.rolling('30D')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.