简体   繁体   中英

More efficient way to group by and count values Pandas dataframe

A more efficient way to do this?

I have a sales records imported from a spreadsheet. I start by importing that list to a dataframe. I then need to get the average orders per customer by month and year. The spreadsheet does not contain counts, just order and customer ID. So I have to count each ID then get drop duplicates and then reset index. Final dataframe is exported back into a spreadsheet and SQL database.

The code below works, and I get the desiered output, but it seems it should be more efficient?? I am new to pandas and python so I'm sure I could do this better.

df_customers = df.filter(
    ['Month', 'Year', 'Order_Date', 'Customer_ID', 'Customer_Name', 'Patient_ID', 'Order_ID'], axis=1)
df_order_count = df.filter(
    ['Month', 'Year'], axis=1)

df_order_count['Order_cnt'] = df_customers.groupby(['Month', 'Year'])['Order_ID'].transform('nunique')
df_order_count['Customer_cnt'] = df_customers.groupby(['Month', 'Year'])['Customer_ID'].transform('nunique')
df_order_count['Avg'] = (df_order_count['Order_cnt'] / df_order_count['Costumer_cnt']).astype(float).round(decimals=2)
df_order_count = df_order_count.drop_duplicates().reset_index(drop=True)

Try this

g = df.groupby(['Month', 'Year'])
df_order_count['Avg'] = g['Order_ID'].transform('nunique')/g['Customer_ID'].transform('nunique')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM