简体   繁体   中英

Pandas script is taking along time to run

This code is meant to find the average promotion value in a given month in a two-year period. In total there are about 11,000 rows in the data set that need to be looked over. The code has been running for 5 minutes and the results still haven't been posted. I'm a still very novice in my coding career so any tips onto better optimize code for faster completion times would be appreciated!

import pandas as pd
df = pd.read_csv(r'C:\Users\james.rush\df_LG.csv')
df.head()
Promos = []
Avg_Promo = []
Dates = []

#This function is used to determine the Average Promotion during any given month/year
def Promo_Avg(Date):
    for x in df['Date']: #For all dates in dataframe
        Promo_Value = df.loc[df['Date'] == Date, 'Promo'] #Locate the corresponding promo given the provided date
        Promos.append(Promo_Value) #Add that Promo to the list of Promos for that month, will need list length later
    Average_Promotion = sum(Promos)/len(Promos) #Average Promotion during the given month
    if Average_Promotion not in Avg_Promo: #Prevents Duplicates
        Avg.append(Average_Promotion)
    if Date not in Dates: #If the Current Date being Checked is not in list, add to list. This will prevent Duplicates
        Dates.append(Dates) 

Function_Dates = [
    'January2020',
    'Febuary2020',
    'March2020'
                 ]
for x in Function_Dates:
Promo_Avg(x)

It seems like you are looping over your dataframe with df.loc but without considering your x variable from the for loop , this for loop seems to be useless then.
So you are looping something like 121,000,000 over your df, that might be why it is slow.

Some more details about your question:
Credits to @DarrylG's comment.

You are trying to

find the average promotion value in a given month in a two-year period

This breaks up in 3 parts :
Find, project and average.

def promo_avg(date):
  return df[df.Date==date].Promos.mean()

Seems to do the job, let's see the three parts in details:
Find:
df[df.Date==date] means from df find lines where column Date corresponds to date

Project:
What I mean by project is the projection from the relational algebra . The goal is to restrict your data to some specific columns, in your case, the column Promo . df[df.Date==date].Promo your previous Find part returns a dataframe, so you can project your data simply by doing .Promo .

Average:
After your projection, you still have a dataframe and all the advantages it comes with, including an averaging function. df[df.Date==date].Promos.mean() should do the trick

I hope it was clear and useful :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM