[英]How do i optimise a Python loop parallelize/multiprocessing
I have to loop for N times to calculate formulas and add results in dataframe. My code works and takes a few seconds to process each Item.我必须循环 N 次才能计算公式并将结果添加到 dataframe 中。我的代码有效并且需要几秒钟来处理每个项目。 However, it can only do one item at a time because I'm running the array through a for loop:但是,它一次只能执行一项,因为我正在通过 for 循环运行数组:
I try to update Code and I add numba library to optimise code我尝试更新代码并添加 numba 库来优化代码
def calculationResults(myconfig,df_results,isvalid,dimension,....othersparams):
for month in nb.prange(0, myconfig.len_production):
calculationbymonth(month,df_results,,....othersparams)
return df_results
But it's still doing one item at a time?但它仍然一次做一个项目? ANy Ideas?有任何想法吗?
We can use parallelized apply using the similar to below function.我们可以使用类似于以下 function 的并行应用。
def parallelize_dataframe(df, func, n_cores=4):
df_split = np.array_split(df, n_cores)
pool = Pool(n_cores)
df = pd.concat(pool.map(func, df_split))
pool.close()
pool.join()
return df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.