简体   繁体   中英

Create new average column based on previous row in pandas

I have dataset as below:

import pandas as pd 

df = pd.DataFrame({
        'ID':  ['27459', '27459', '27459', '27459', '27459', '27459', '27459', '48002', '48002', '48002'],
        'Invoice_Date': ['2020-06-26', '2020-06-29', '2020-06-30', '2020-07-14', '2020-07-25', 
                         '2020-07-30', '2020-08-02', '2020-05-13', '2020-06-20', '2020-06-28'],
        'Delay': [2,-2,0,1,2,9,12,29,0,1],
        'Difference_Date': [0,3,1,14,11,5,3,0,38,8],
        })

I need to create two new columns which is the average of Delay and Difference_Date in 30 days of previous column's date. The data is customer-based data, so it need to be sort and group into ID .

My expected output is:


    ID  Invoice_Date    Delay   Difference_Date  Avg_Delay   Avg_Difference_Date
27459   2020-06-26       2      0                0.00        0.000000
27459   2020-06-29      -2      3                2.00        0.000000
27459   2020-06-30       0      1                0.00        1.500000
27459   2020-07-14       1      14               0.00        1.333333
27459   2020-07-25       2      11               0.25        4.500000
27459   2020-07-30       9      5                0.60        5.800000
27459   2020-08-02       12     3                4.00        10.000000
48002   2020-05-13       29     0                0.00        0.000000
48002   2020-06-20       0      38               29.00       0.000000
48002   2020-06-28       1      8                0.00        38.000000

You need to use a rolling approach, specifying 30 days ("30D"), then shift to consider only the past days (not including the day itself):

df['Invoice_Date'] = pd.to_datetime(df['Invoice_Date'])
df = df.set_index('Invoice_Date')

df[['Avg_Delay', 'Avg_Difference_Date']] = (
    df.groupby('ID').transform(lambda x: x.rolling('30D').mean())
    .shift().fillna(0)
)

# Rearrange columns to exact match to output:
df = df.reset_index().iloc[:, [1,0] + list(range(2, df.shape[1]+1))]

Output:

      ID Invoice_Date  Delay  Difference_Date  Avg_Delay  Avg_Difference_Date
0  27459   2020-06-26      2                0       0.00             0.000000
1  27459   2020-06-29     -2                3       2.00             0.000000
2  27459   2020-06-30      0                1       0.00             1.500000
3  27459   2020-07-14      1               14       0.00             1.333333
4  27459   2020-07-25      2               11       0.25             4.500000
5  27459   2020-07-30      9                5       0.60             5.800000
6  27459   2020-08-02     12                3       4.00            10.000000
7  48002   2020-05-13     29                0       6.00             8.250000
8  48002   2020-06-20      0               38      29.00             0.000000
9  48002   2020-06-28      1                8       0.00            38.000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM