[英]Create new average column based on previous row in pandas
I have dataset as below:我有如下数据集:
import pandas as pd
df = pd.DataFrame({
'ID': ['27459', '27459', '27459', '27459', '27459', '27459', '27459', '48002', '48002', '48002'],
'Invoice_Date': ['2020-06-26', '2020-06-29', '2020-06-30', '2020-07-14', '2020-07-25',
'2020-07-30', '2020-08-02', '2020-05-13', '2020-06-20', '2020-06-28'],
'Delay': [2,-2,0,1,2,9,12,29,0,1],
'Difference_Date': [0,3,1,14,11,5,3,0,38,8],
})
I need to create two new columns which is the average of Delay
and Difference_Date
in 30 days of previous column's date.我需要创建两个新列,它们是上一列日期的 30 天内Delay
和Difference_Date
的平均值。 The data is customer-based data, so it need to be sort and group into ID
.数据是基于客户的数据,因此需要进行排序和分组到ID
中。
My expected output is:我预期的 output 是:
ID Invoice_Date Delay Difference_Date Avg_Delay Avg_Difference_Date
27459 2020-06-26 2 0 0.00 0.000000
27459 2020-06-29 -2 3 2.00 0.000000
27459 2020-06-30 0 1 0.00 1.500000
27459 2020-07-14 1 14 0.00 1.333333
27459 2020-07-25 2 11 0.25 4.500000
27459 2020-07-30 9 5 0.60 5.800000
27459 2020-08-02 12 3 4.00 10.000000
48002 2020-05-13 29 0 0.00 0.000000
48002 2020-06-20 0 38 29.00 0.000000
48002 2020-06-28 1 8 0.00 38.000000
You need to use a rolling
approach, specifying 30 days ("30D"), then shift
to consider only the past days (not including the day itself):您需要使用rolling
方法,指定 30 天(“30D”),然后shift
到仅考虑过去几天(不包括当天本身):
df['Invoice_Date'] = pd.to_datetime(df['Invoice_Date'])
df = df.set_index('Invoice_Date')
df[['Avg_Delay', 'Avg_Difference_Date']] = (
df.groupby('ID').transform(lambda x: x.rolling('30D').mean())
.shift().fillna(0)
)
# Rearrange columns to exact match to output:
df = df.reset_index().iloc[:, [1,0] + list(range(2, df.shape[1]+1))]
Output: Output:
ID Invoice_Date Delay Difference_Date Avg_Delay Avg_Difference_Date
0 27459 2020-06-26 2 0 0.00 0.000000
1 27459 2020-06-29 -2 3 2.00 0.000000
2 27459 2020-06-30 0 1 0.00 1.500000
3 27459 2020-07-14 1 14 0.00 1.333333
4 27459 2020-07-25 2 11 0.25 4.500000
5 27459 2020-07-30 9 5 0.60 5.800000
6 27459 2020-08-02 12 3 4.00 10.000000
7 48002 2020-05-13 29 0 6.00 8.250000
8 48002 2020-06-20 0 38 29.00 0.000000
9 48002 2020-06-28 1 8 0.00 38.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.