I'm trying to get new df column named as 'sales_30d_lag' with aggregated sales of last 30 days from last purchase date per user_id. I know how to get the 30 days lag see below for my code but that won't resolve the issue since it is a fixed date.
user_id | purchase_date | product | sales |
---|---|---|---|
1 | 1/1/21 | A | 1 |
2 | 1/1/21 | A | 1 |
max_date = max(df['purchase_date'])
df['30d_lag']= pd.to_datetime(df['max_date']) - pd.to_timedelta(30)
I have also used a different approach but that doesn't seem to work either. Any ideas how to get this column?
start_date = pd.to_datetime(df['max_date'])
end_date = start_date - pd.to_timedelta(30)
df_30d_lag = df[df['purchase_date'].between(start_date, end_date)].groupby('user_id').agg({'sales':'sum'}).rename(columns={'sales':'sales_30d_lag'}).reset_index()
You could use combination of isin
and pd.date_range
functions.
Here's an example:
start_date = pd.to_datetime(df['max_date'])
end_date = start_date - pd.to_timedelta(30)
30_d_df = df[df['datetime_col'].isin(pd.date_range(start_date, end_date, freq='D'))]
# Once the filtration is complete you can use your normal groupby function
30_d_df.groupby('user_id').agg({'sales':'sum'})
NOTE: For this function to work you need to have datetime_col
in datetime (if it already isn't in it).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.