简体   繁体   中英

Looking for a way to speed up pandas dataframe search

So i'm trying to calculate the age of a given product by looking up its release date in the dataframe ( the release date is the earliest date in which we can find the product) and subtracting it from the current date in the dataframe. However, the search for the release date is taking up so much time (2 hours by the time i'm making this thread). Note : Dataframe has over 300k rows.

i'm using the .loc method in pandas which seems to be the source of the problem.

#Age Calculation
def item_age(release,current):
    age = (current - release) / timedelta(days=365.2425)
    age="%.3f" % age
    return age
#Get the release date of a given item 
def getItem_releaseDate(sales_data,index):
    date=sales.loc[(sales.item_id==index),'date']
    release=[]
    for i in date:
        release.append(datetime.datetime.strptime(i,'%d.%m.%Y'))
    mini=min(release)
    return mini
#Appending age to item
def getItem_age(sales_data):
    sales=sales_data
    sales['age']=0
    for index,row in sales.iterrows():
        current=datetime.datetime.strptime(row['date'],'%d.%m.%Y')
        release=getItem_releaseDate(sales_data,row["item_id"])
        row["age"]=item_age(release,current)
    return sales

Try the following (I'm not sure if it is working because I don't have data to test it)

#Appending age to item
def getItem_age(sales_data):
    sales_data['age']=item_age(sales_data.date.values, getItem_releaseDate(sales_data,sales_data.item_id.values))
    return sales_data

#Age Calculation
def item_age(release,current):
    age = (current - release) / timedelta(days=365.2425)
    age="%.3f" % age
    return age

#Get the release date of a given item 
def getItem_releaseDate(sales_data,index):
    return sales_data.loc[sales_data.item_id == index].date.min()

The problem with your code is that you are looping in an inefficient way over your dataset. You can usually get rid of this by using "Vectorization with NumPy".You can check Optimizing Pandas for more information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM