So i'm trying to calculate the age of a given product by looking up its release date in the dataframe ( the release date is the earliest date in which we can find the product) and subtracting it from the current date in the dataframe. However, the search for the release date is taking up so much time (2 hours by the time i'm making this thread). Note : Dataframe has over 300k rows.
i'm using the .loc method in pandas which seems to be the source of the problem.
#Age Calculation
def item_age(release,current):
age = (current - release) / timedelta(days=365.2425)
age="%.3f" % age
return age
#Get the release date of a given item
def getItem_releaseDate(sales_data,index):
date=sales.loc[(sales.item_id==index),'date']
release=[]
for i in date:
release.append(datetime.datetime.strptime(i,'%d.%m.%Y'))
mini=min(release)
return mini
#Appending age to item
def getItem_age(sales_data):
sales=sales_data
sales['age']=0
for index,row in sales.iterrows():
current=datetime.datetime.strptime(row['date'],'%d.%m.%Y')
release=getItem_releaseDate(sales_data,row["item_id"])
row["age"]=item_age(release,current)
return sales
Try the following (I'm not sure if it is working because I don't have data to test it)
#Appending age to item
def getItem_age(sales_data):
sales_data['age']=item_age(sales_data.date.values, getItem_releaseDate(sales_data,sales_data.item_id.values))
return sales_data
#Age Calculation
def item_age(release,current):
age = (current - release) / timedelta(days=365.2425)
age="%.3f" % age
return age
#Get the release date of a given item
def getItem_releaseDate(sales_data,index):
return sales_data.loc[sales_data.item_id == index].date.min()
The problem with your code is that you are looping in an inefficient way over your dataset. You can usually get rid of this by using "Vectorization with NumPy".You can check Optimizing Pandas for more information.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.