繁体   English   中英

python multiprocessing-如何对中期结果采取行动

[英]python multiprocessing - how to act on interim results

我正在使用熊猫来计算大量数据的统计信息,但最终运行了几个小时,而且我经常得到新数据。 我已经尝试过优化,但是我想使其更快,所以我试图使其使用多个过程。 我遇到的问题是,我需要在结果完成时进行一些临时工作,并且已经看到了用于multiprocessing.Process的示例multiprocessing.ProcessPool在处理结果之前都等待所有事情完成。

这是我现在正在使用的大量精简代码。 我要放在单独的进程中的片段是generateAnalytics()。

for counter, symbol in enumerate(queuelist):  # queuelist
    if needQueueLoad:  # set by another thread that's monitoring for new data (in the form of a new file that arrives a couple times a day)
        log.info('Shutting down analyticsRunner thread')
        break
    dfDay = generateAnalytics(symbol)  # slow running function (15s+)
    astore[analyticsTable(symbol)] = dfDay  # astore is a pandas store (HDF5). analyticsTable() returns the name of the appropriate table, which gets overwritten
    dfLatest.loc[symbol] = dfDay.iloc[-1]  # update with the latest results (dfLatest is the latest results for each symbol, which is loaded as a global at startup and periodically saved back to the store in another thread)

    log.info('Processed {}/{} securities in queue.'.format(counter+1, len(queuelist)))
    # do some stuff to update progress GUI 

我无法弄清楚在进行过程中如何使最后几行与结果配合使用,希望能提出一些建议。

我正在考虑将其全部在Pool运行,并让进程将结果添加到Queue (而不是返回它们),然后在主进程中放置一会儿循环,以便在结果进入时退出队列-这样是一个合理的方法吗? 就像是:

mpqueue = multiprocessing.Queue()
pool = multiprocessing.Pool()
pool.map(generateAnalytics, [queuelist, mpqueue])

while not needQueueLoad:  # set by another thread that's monitoring for new data (in the form of a new file that arrives a couple times a day)
    while not mpqueue.empty():
        dfDay = mpqueue.get()
        astore[analyticsTable(symbol)] = dfDay  # astore is a pandas store (HDF5). analyticsTable() returns the name of the appropriate table, which gets overwritten
        dfLatest.loc[symbol] = dfDay.iloc[-1]  # update with the latest results (dfLatest is the latest results for each symbol, which is loaded as a global at startup and periodically saved back to the store in another thread)    
        log.info('Processed {}/{} securities in queue.'.format(counter+1, len(queuelist)))
        # do some stuff to update GUI that shows progress            
    sleep(0.1)
    # do some bookkeeping to see if queue has finished
pool.join()

使用Queue似乎是一种合理的方法,但有两个说明。

  1. 由于它是从您正在使用GUI的代码中查找的,因此检查结果可能最好在超时函数或空闲函数中进行,而不是在while循环中进行。 使用while循环检查结果将阻止GUI的事件循环。

  2. 如果工作进程需要通过队列将大量数据返回到主进程,则这将增加大量开销。 您可能要考虑使用共享内存,甚至是中间文件。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM