![](/img/trans.png)
[英]Get interim results of function calls while debugging Python in PyCharm
[英]python multiprocessing - how to act on interim results
我正在使用熊猫来计算大量数据的统计信息,但最终运行了几个小时,而且我经常得到新数据。 我已经尝试过优化,但是我想使其更快,所以我试图使其使用多个过程。 我遇到的问题是,我需要在结果完成时进行一些临时工作,并且已经看到了用于multiprocessing.Process
的示例multiprocessing.Process
和Pool
在处理结果之前都等待所有事情完成。
这是我现在正在使用的大量精简代码。 我要放在单独的进程中的片段是generateAnalytics()。
for counter, symbol in enumerate(queuelist): # queuelist
if needQueueLoad: # set by another thread that's monitoring for new data (in the form of a new file that arrives a couple times a day)
log.info('Shutting down analyticsRunner thread')
break
dfDay = generateAnalytics(symbol) # slow running function (15s+)
astore[analyticsTable(symbol)] = dfDay # astore is a pandas store (HDF5). analyticsTable() returns the name of the appropriate table, which gets overwritten
dfLatest.loc[symbol] = dfDay.iloc[-1] # update with the latest results (dfLatest is the latest results for each symbol, which is loaded as a global at startup and periodically saved back to the store in another thread)
log.info('Processed {}/{} securities in queue.'.format(counter+1, len(queuelist)))
# do some stuff to update progress GUI
我无法弄清楚在进行过程中如何使最后几行与结果配合使用,希望能提出一些建议。
我正在考虑将其全部在Pool
运行,并让进程将结果添加到Queue
(而不是返回它们),然后在主进程中放置一会儿循环,以便在结果进入时退出队列-这样是一个合理的方法吗? 就像是:
mpqueue = multiprocessing.Queue()
pool = multiprocessing.Pool()
pool.map(generateAnalytics, [queuelist, mpqueue])
while not needQueueLoad: # set by another thread that's monitoring for new data (in the form of a new file that arrives a couple times a day)
while not mpqueue.empty():
dfDay = mpqueue.get()
astore[analyticsTable(symbol)] = dfDay # astore is a pandas store (HDF5). analyticsTable() returns the name of the appropriate table, which gets overwritten
dfLatest.loc[symbol] = dfDay.iloc[-1] # update with the latest results (dfLatest is the latest results for each symbol, which is loaded as a global at startup and periodically saved back to the store in another thread)
log.info('Processed {}/{} securities in queue.'.format(counter+1, len(queuelist)))
# do some stuff to update GUI that shows progress
sleep(0.1)
# do some bookkeeping to see if queue has finished
pool.join()
使用Queue
似乎是一种合理的方法,但有两个说明。
由于它是从您正在使用GUI的代码中查找的,因此检查结果可能最好在超时函数或空闲函数中进行,而不是在while循环中进行。 使用while循环检查结果将阻止GUI的事件循环。
如果工作进程需要通过队列将大量数据返回到主进程,则这将增加大量开销。 您可能要考虑使用共享内存,甚至是中间文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.