簡體   English   中英

python multiprocessing-如何對中期結果采取行動

[英]python multiprocessing - how to act on interim results

我正在使用熊貓來計算大量數據的統計信息,但最終運行了幾個小時,而且我經常得到新數據。 我已經嘗試過優化,但是我想使其更快,所以我試圖使其使用多個過程。 我遇到的問題是,我需要在結果完成時進行一些臨時工作,並且已經看到了用於multiprocessing.Process的示例multiprocessing.ProcessPool在處理結果之前都等待所有事情完成。

這是我現在正在使用的大量精簡代碼。 我要放在單獨的進程中的片段是generateAnalytics()。

for counter, symbol in enumerate(queuelist):  # queuelist
    if needQueueLoad:  # set by another thread that's monitoring for new data (in the form of a new file that arrives a couple times a day)
        log.info('Shutting down analyticsRunner thread')
        break
    dfDay = generateAnalytics(symbol)  # slow running function (15s+)
    astore[analyticsTable(symbol)] = dfDay  # astore is a pandas store (HDF5). analyticsTable() returns the name of the appropriate table, which gets overwritten
    dfLatest.loc[symbol] = dfDay.iloc[-1]  # update with the latest results (dfLatest is the latest results for each symbol, which is loaded as a global at startup and periodically saved back to the store in another thread)

    log.info('Processed {}/{} securities in queue.'.format(counter+1, len(queuelist)))
    # do some stuff to update progress GUI 

我無法弄清楚在進行過程中如何使最后幾行與結果配合使用,希望能提出一些建議。

我正在考慮將其全部在Pool運行,並讓進程將結果添加到Queue (而不是返回它們),然后在主進程中放置一會兒循環,以便在結果進入時退出隊列-這樣是一個合理的方法嗎? 就像是:

mpqueue = multiprocessing.Queue()
pool = multiprocessing.Pool()
pool.map(generateAnalytics, [queuelist, mpqueue])

while not needQueueLoad:  # set by another thread that's monitoring for new data (in the form of a new file that arrives a couple times a day)
    while not mpqueue.empty():
        dfDay = mpqueue.get()
        astore[analyticsTable(symbol)] = dfDay  # astore is a pandas store (HDF5). analyticsTable() returns the name of the appropriate table, which gets overwritten
        dfLatest.loc[symbol] = dfDay.iloc[-1]  # update with the latest results (dfLatest is the latest results for each symbol, which is loaded as a global at startup and periodically saved back to the store in another thread)    
        log.info('Processed {}/{} securities in queue.'.format(counter+1, len(queuelist)))
        # do some stuff to update GUI that shows progress            
    sleep(0.1)
    # do some bookkeeping to see if queue has finished
pool.join()

使用Queue似乎是一種合理的方法,但有兩個說明。

  1. 由於它是從您正在使用GUI的代碼中查找的,因此檢查結果可能最好在超時函數或空閑函數中進行,而不是在while循環中進行。 使用while循環檢查結果將阻止GUI的事件循環。

  2. 如果工作進程需要通過隊列將大量數據返回到主進程,則這將增加大量開銷。 您可能要考慮使用共享內存,甚至是中間文件。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM