简体   繁体   English

根据 pandas 中的前一行创建新的平均列

[英]Create new average column based on previous row in pandas

I have dataset as below:我有如下数据集:

import pandas as pd 

df = pd.DataFrame({
        'ID':  ['27459', '27459', '27459', '27459', '27459', '27459', '27459', '48002', '48002', '48002'],
        'Invoice_Date': ['2020-06-26', '2020-06-29', '2020-06-30', '2020-07-14', '2020-07-25', 
                         '2020-07-30', '2020-08-02', '2020-05-13', '2020-06-20', '2020-06-28'],
        'Delay': [2,-2,0,1,2,9,12,29,0,1],
        'Difference_Date': [0,3,1,14,11,5,3,0,38,8],
        })

I need to create two new columns which is the average of Delay and Difference_Date in 30 days of previous column's date.我需要创建两个新列,它们是上一列日期的 30 天内DelayDifference_Date的平均值。 The data is customer-based data, so it need to be sort and group into ID .数据是基于客户的数据,因此需要进行排序和分组到ID中。

My expected output is:我预期的 output 是:


    ID  Invoice_Date    Delay   Difference_Date  Avg_Delay   Avg_Difference_Date
27459   2020-06-26       2      0                0.00        0.000000
27459   2020-06-29      -2      3                2.00        0.000000
27459   2020-06-30       0      1                0.00        1.500000
27459   2020-07-14       1      14               0.00        1.333333
27459   2020-07-25       2      11               0.25        4.500000
27459   2020-07-30       9      5                0.60        5.800000
27459   2020-08-02       12     3                4.00        10.000000
48002   2020-05-13       29     0                0.00        0.000000
48002   2020-06-20       0      38               29.00       0.000000
48002   2020-06-28       1      8                0.00        38.000000

You need to use a rolling approach, specifying 30 days ("30D"), then shift to consider only the past days (not including the day itself):您需要使用rolling方法,指定 30 天(“30D”),然后shift到仅考虑过去几天(不包括当天本身):

df['Invoice_Date'] = pd.to_datetime(df['Invoice_Date'])
df = df.set_index('Invoice_Date')

df[['Avg_Delay', 'Avg_Difference_Date']] = (
    df.groupby('ID').transform(lambda x: x.rolling('30D').mean())
    .shift().fillna(0)
)

# Rearrange columns to exact match to output:
df = df.reset_index().iloc[:, [1,0] + list(range(2, df.shape[1]+1))]

Output: Output:

      ID Invoice_Date  Delay  Difference_Date  Avg_Delay  Avg_Difference_Date
0  27459   2020-06-26      2                0       0.00             0.000000
1  27459   2020-06-29     -2                3       2.00             0.000000
2  27459   2020-06-30      0                1       0.00             1.500000
3  27459   2020-07-14      1               14       0.00             1.333333
4  27459   2020-07-25      2               11       0.25             4.500000
5  27459   2020-07-30      9                5       0.60             5.800000
6  27459   2020-08-02     12                3       4.00            10.000000
7  48002   2020-05-13     29                0       6.00             8.250000
8  48002   2020-06-20      0               38      29.00             0.000000
9  48002   2020-06-28      1                8       0.00            38.000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据上一行的值在熊猫数据框中创建一个新列 - Create a new column in a pandas dataframe based on values found on a previous row 如何根据 Pandas dataframe 中上一行的行值创建新列? - How to create a new column based on row value in previous row in Pandas dataframe? 如何使用基于上一行和下一行的条件在 Pandas Dataframe 上创建新列? - How can I create a new column on a Pandas Dataframe with conditions based on previous and next row? Pandas:创建新列并根据条件使用上一行的值填充 - Pandas: Create new column and populate with value from previous row based on conditions 根据上一个行值创建一个新列并删除当前行 - Create a new column based on previous row value and delete the current row Pandas DataFrame:添加具有基于前一行计算值的新列 - Pandas DataFrame: Add new column with calculated values based on previous row 使用上一行用值创建新的Pandas DataFrame列 - Create New Pandas DataFrame Column with Values using Previous Row Pandas:根据新创建的列中的上述行创建新列 - Pandas: create new column based on above row in the newly created column Python Pandas平均根据条件进入新列 - Python Pandas average based on condition into new column 熊猫根据另一列中的值创建新列,如果为False,则返回新列的先前值 - Pandas Create New Column Based on Value in Another Column, If False Return Previous Value of New Column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM