Most efficient way to determine average # ordered per item per day in Python

Question

I am new to Python and have a large dataset (over 55,000 rows to be exact) in a dataframe with columns UPC, quantity_picked, and date. I am trying to determine the most efficient way to calculate the average quantity picked per UPC per day. Is there a way that this can be done more efficiently than using nested loops?

I have tried the .mean approach but of course it only provides the average quantity picked per UPC per order. I have also tried implementing .groupby().

practice_df_by_UPC = practice_df.groupby('UPC')
print(practice_df_by_UPC)

I receive no error but no output either.

I am wanting an output that will allow me to see the average number of items picked by UPC per day. Is there a way to do this for dataframes that I'm not finding that is more efficient, or is a set of nested loops the way to go?

Answer 1

After switching out groupby with set_index and doing some extra troubleshooting, the following code resolved the issue and accomplished the goal of determining the average quantity for a specified UPC and date (example UPC and date plugged in):

#creating multi-index based on date and UPC
practice_df.set_index(['date_expected', 'UPC'], inplace=True)

print('Mean:')
print(practice_df.loc['2019-05-15', '0000000004011'].mean())

The inplace=True addition got rid of the remaining error. While printing the dataframe it is now multi indexed organized first by date, then by UPC.

Most efficient way to determine average # ordered per item per day in Python

Question

1 answers

solution1
0 ACCPTED 2019-06-19 20:04:39

Most efficient way to determine average # ordered per item per day in Python

Question

1 answers

solution1 0 ACCPTED 2019-06-19 20:04:39

solution1
0 ACCPTED 2019-06-19 20:04:39