简体   繁体   English

如何将自定义滚动功能应用于 Pandas groupby?

[英]How to apply a custom rolling function to pandas groupby?

I would like to calculate the daily sales from average sales using the following function:我想使用以下函数根据平均销售额计算每日销售额:

def derive_daily_sales(avg_sales_series, period, first_day_sales):
    """
    derive the daily sales from previous_avg_sales start date to current_avg_sales end date
    for detail formula, please refer to README.md

    @avg_sales_series: an array of avg  sales(e.g. 2020-08-04 to 2020-08-06)
    @period: the averaging period in days (e.g. 30 days, 90 days)
    @first_day_sales: the sales at the first day of previous_avg_sales
    """

    x_n1 = avg_sales_series[-1]*period - avg_sales_series[0]*period + first_day_sales

    return x_n1

The avg_sales_series is supposed to be a pandas series. avg_sales_series应该是一个熊猫系列。

The dataframe looks like the following:数据框如下所示:

date, customer_id, avg_30_day_sales
12/08/2020, 1, 30
13/08/2020, 1, 40
14/08/2020, 1, 40
12/08/2020, 2, 20
13/08/2020, 2, 40
14/08/2020, 2, 30

I would like to first groupby customer_id and sort by date .我想首先分组customer_id并按date排序。 Then, get the rolling window of size 2. And apply the custom function derive_daily_sales assuming that period =30 and first_day_sales equal to the first avg_30_day_sales .然后,得到大小2的滚动窗口和应用自定义功能derive_daily_sales假设period = 30和first_day_sales等于第一avg_30_day_sales

I tried:我试过:

df_sales_grouped = df_sales.sort_values('date').groupby(['customer_id','date'])]

df_daily_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].rolling(2).apply(derive_daily_sales, axis=1, period=30, first_day_sales= df_sales['avg_30_day_sales'][0])

You should not group by the date since you want to roll over that column, so the grouping should be:您不应该按日期分组,因为您要滚动该列,因此分组应该是:

df_sales_grouped = df_sales.sort_values('date').groupby('customer_id')

Next, what you actually want to do is apply a rolling window on each group in the dataframe.接下来,您真正想要做的是在数据框中的每个组上apply滚动窗口。 So you need to use apply twice, once on the grouped dataframe and once on each rolling window.所以你需要使用apply两次,一次在分组数据帧上,一次在每个滚动窗口上。 This can be done as follows:这可以按如下方式完成:

rolling_arguments = {'period': 30, 'first_day_sales': df_sales['avg_30_day_sales'][0]}
df_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].apply(
    lambda g: g.rolling(2).apply(derive_daily_sales, kwargs=rolling_arguments))

For the given input data, the result is:对于给定的输入数据,结果为:

      date  customer_id  avg_30_day_sales  daily_sales
12/08/2020            1                30          NaN
13/08/2020            1                40        330.0
14/08/2020            1                40         30.0
12/08/2020            2                20          NaN
13/08/2020            2                40        630.0
14/08/2020            2                30       -270.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM