简体   繁体   中英

Sum of sales for last 30days per user with Python

I'm trying to get new df column named as 'sales_30d_lag' with aggregated sales of last 30 days from last purchase date per user_id. I know how to get the 30 days lag see below for my code but that won't resolve the issue since it is a fixed date.

user_id purchase_date product sales
1 1/1/21 A 1
2 1/1/21 A 1
max_date = max(df['purchase_date'])
df['30d_lag']= pd.to_datetime(df['max_date']) - pd.to_timedelta(30)

I have also used a different approach but that doesn't seem to work either. Any ideas how to get this column?

start_date = pd.to_datetime(df['max_date'])
end_date = start_date - pd.to_timedelta(30)
df_30d_lag = df[df['purchase_date'].between(start_date, end_date)].groupby('user_id').agg({'sales':'sum'}).rename(columns={'sales':'sales_30d_lag'}).reset_index()

You could use combination of isin and pd.date_range functions.

Here's an example:

start_date = pd.to_datetime(df['max_date'])
end_date = start_date - pd.to_timedelta(30)

30_d_df = df[df['datetime_col'].isin(pd.date_range(start_date, end_date, freq='D'))]

# Once the filtration is complete you can use your normal groupby function 
30_d_df.groupby('user_id').agg({'sales':'sum'})

NOTE: For this function to work you need to have datetime_col in datetime (if it already isn't in it).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM