[英]How to apply a custom rolling function to pandas groupby?
I would like to calculate the daily sales from average sales using the following function:我想使用以下函数根据平均销售额计算每日销售额:
def derive_daily_sales(avg_sales_series, period, first_day_sales):
"""
derive the daily sales from previous_avg_sales start date to current_avg_sales end date
for detail formula, please refer to README.md
@avg_sales_series: an array of avg sales(e.g. 2020-08-04 to 2020-08-06)
@period: the averaging period in days (e.g. 30 days, 90 days)
@first_day_sales: the sales at the first day of previous_avg_sales
"""
x_n1 = avg_sales_series[-1]*period - avg_sales_series[0]*period + first_day_sales
return x_n1
The avg_sales_series
is supposed to be a pandas series. avg_sales_series
应该是一个熊猫系列。
The dataframe looks like the following:数据框如下所示:
date, customer_id, avg_30_day_sales
12/08/2020, 1, 30
13/08/2020, 1, 40
14/08/2020, 1, 40
12/08/2020, 2, 20
13/08/2020, 2, 40
14/08/2020, 2, 30
I would like to first groupby customer_id
and sort by date
.我想首先分组customer_id
并按date
排序。 Then, get the rolling window of size 2. And apply the custom function derive_daily_sales
assuming that period
=30 and first_day_sales
equal to the first avg_30_day_sales
.然后,得到大小2的滚动窗口和应用自定义功能derive_daily_sales
假设period
= 30和first_day_sales
等于第一avg_30_day_sales
。
I tried:我试过:
df_sales_grouped = df_sales.sort_values('date').groupby(['customer_id','date'])]
df_daily_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].rolling(2).apply(derive_daily_sales, axis=1, period=30, first_day_sales= df_sales['avg_30_day_sales'][0])
You should not group by the date since you want to roll over that column, so the grouping should be:您不应该按日期分组,因为您要滚动该列,因此分组应该是:
df_sales_grouped = df_sales.sort_values('date').groupby('customer_id')
Next, what you actually want to do is apply
a rolling window on each group in the dataframe.接下来,您真正想要做的是在数据框中的每个组上apply
滚动窗口。 So you need to use apply
twice, once on the grouped dataframe and once on each rolling window.所以你需要使用apply
两次,一次在分组数据帧上,一次在每个滚动窗口上。 This can be done as follows:这可以按如下方式完成:
rolling_arguments = {'period': 30, 'first_day_sales': df_sales['avg_30_day_sales'][0]}
df_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].apply(
lambda g: g.rolling(2).apply(derive_daily_sales, kwargs=rolling_arguments))
For the given input data, the result is:对于给定的输入数据,结果为:
date customer_id avg_30_day_sales daily_sales
12/08/2020 1 30 NaN
13/08/2020 1 40 330.0
14/08/2020 1 40 30.0
12/08/2020 2 20 NaN
13/08/2020 2 40 630.0
14/08/2020 2 30 -270.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.