简体   繁体   中英

Most efficient way to determine average # ordered per item per day in Python

I am new to Python and have a large dataset (over 55,000 rows to be exact) in a dataframe with columns UPC, quantity_picked, and date. I am trying to determine the most efficient way to calculate the average quantity picked per UPC per day. Is there a way that this can be done more efficiently than using nested loops?

I have tried the .mean approach but of course it only provides the average quantity picked per UPC per order. I have also tried implementing .groupby().

practice_df_by_UPC = practice_df.groupby('UPC')
print(practice_df_by_UPC)

I receive no error but no output either.

I am wanting an output that will allow me to see the average number of items picked by UPC per day. Is there a way to do this for dataframes that I'm not finding that is more efficient, or is a set of nested loops the way to go?

After switching out groupby with set_index and doing some extra troubleshooting, the following code resolved the issue and accomplished the goal of determining the average quantity for a specified UPC and date (example UPC and date plugged in):

#creating multi-index based on date and UPC
practice_df.set_index(['date_expected', 'UPC'], inplace=True)

print('Mean:')
print(practice_df.loc['2019-05-15', '0000000004011'].mean())

The inplace=True addition got rid of the remaining error. While printing the dataframe it is now multi indexed organized first by date, then by UPC.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM