A more efficient way to do this?
I have a sales records imported from a spreadsheet. I start by importing that list to a dataframe. I then need to get the average orders per customer by month and year. The spreadsheet does not contain counts, just order and customer ID. So I have to count each ID then get drop duplicates and then reset index. Final dataframe is exported back into a spreadsheet and SQL database.
The code below works, and I get the desiered output, but it seems it should be more efficient?? I am new to pandas and python so I'm sure I could do this better.
df_customers = df.filter(
['Month', 'Year', 'Order_Date', 'Customer_ID', 'Customer_Name', 'Patient_ID', 'Order_ID'], axis=1)
df_order_count = df.filter(
['Month', 'Year'], axis=1)
df_order_count['Order_cnt'] = df_customers.groupby(['Month', 'Year'])['Order_ID'].transform('nunique')
df_order_count['Customer_cnt'] = df_customers.groupby(['Month', 'Year'])['Customer_ID'].transform('nunique')
df_order_count['Avg'] = (df_order_count['Order_cnt'] / df_order_count['Costumer_cnt']).astype(float).round(decimals=2)
df_order_count = df_order_count.drop_duplicates().reset_index(drop=True)
Try this
g = df.groupby(['Month', 'Year'])
df_order_count['Avg'] = g['Order_ID'].transform('nunique')/g['Customer_ID'].transform('nunique')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.