[英]Improving pandas performance with apply method
I'm working on pandas for high performance calculations, the below function gives 1 loop, best of 5: 7.24 s per loop for 50,000 rows. 我正在研究用于高性能计算的pandas,下面的函数给出了1个循环,最佳的5:7.24 s每循环 50,000行。
I have to scale it to 1 million rows. 我必须将它扩展到100万行。
How to vectorise the function and apply to all rows. 如何向量化该函数并应用于所有行。 So that overall performance can be improved?
那么整体性能可以提高吗?
def weightedFlowAmt(startDate,endDate,tradeDate,tradeAmt):
startInDays = datetime.strptime(startDate, "%Y-%m-%d")
endInDays = datetime.strptime(endDate, "%Y-%m-%d")
tradeInDays = datetime.strptime(tradeDate, "%Y-%m-%d")
differenceTradeAndEnd=abs((endInDays - tradeInDays).days)
differenceStartAndEnd=abs((endInDays - startInDays).days)
weighted_FlowAmt = (tradeAmt * differenceTradeAndEnd)/differenceStartAndEnd
mutatedCashFlow['flow'] = mutatedCashFlow.apply(lambda row:
weightedFlowAmt(row['startDate'], row['EndDate'], row['tradeDate'],
row['tradeAmount']),
axis=1)
I think you can remove apply
and use vectorized functions: 我认为你可以删除
apply
并使用矢量化函数:
mutatedCashFlow['startDate'] = pd.to_datetime(mutatedCashFlow['startDate'])
mutatedCashFlow['EndDate'] = pd.to_datetime(mutatedCashFlow['EndDate'])
mutatedCashFlow['tradeDate'] = pd.to_datetime(mutatedCashFlow['tradeDate'])
diffTradeAndEnd=((mutatedCashFlow['EndDate']-mutatedCashFlow['tradeDate']).dt.days).abs()
diffStartAndEnd=((mutatedCashFlow['EndDate']-mutatedCashFlow['startDate']).dt.days).abs()
mutatedCashFlow['flow'] = (mutatedCashFlow['tradeAmount']*diffTradeAndEnd)/diffStartAndEnd
Alternative: 替代方案:
mutatedCashFlow['startDate'] = pd.to_datetime(mutatedCashFlow['startDate'])
mutatedCashFlow['EndDate'] = pd.to_datetime(mutatedCashFlow['EndDate'])
mutatedCashFlow['tradeDate'] = pd.to_datetime(mutatedCashFlow['tradeDate'])
diffTradeAndEnd=mutatedCashFlow['EndDate'].sub(mutatedCashFlow['tradeDate']).dt.days.abs()
diffStartAndEnd=mutatedCashFlow['EndDate'].sub(mutatedCashFlow['startDate']).dt.days.abs()
mutatedCashFlow['flow'] = mutatedCashFlow['tradeAmount'].mul(diffTradeAndEnd)
.div(diffStartAndEnd)
print (mutatedCashFlow)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.