![](/img/trans.png)
[英]Get interim results of function calls while debugging Python in PyCharm
[英]python multiprocessing - how to act on interim results
我正在使用熊貓來計算大量數據的統計信息,但最終運行了幾個小時,而且我經常得到新數據。 我已經嘗試過優化,但是我想使其更快,所以我試圖使其使用多個過程。 我遇到的問題是,我需要在結果完成時進行一些臨時工作,並且已經看到了用於multiprocessing.Process
的示例multiprocessing.Process
和Pool
在處理結果之前都等待所有事情完成。
這是我現在正在使用的大量精簡代碼。 我要放在單獨的進程中的片段是generateAnalytics()。
for counter, symbol in enumerate(queuelist): # queuelist
if needQueueLoad: # set by another thread that's monitoring for new data (in the form of a new file that arrives a couple times a day)
log.info('Shutting down analyticsRunner thread')
break
dfDay = generateAnalytics(symbol) # slow running function (15s+)
astore[analyticsTable(symbol)] = dfDay # astore is a pandas store (HDF5). analyticsTable() returns the name of the appropriate table, which gets overwritten
dfLatest.loc[symbol] = dfDay.iloc[-1] # update with the latest results (dfLatest is the latest results for each symbol, which is loaded as a global at startup and periodically saved back to the store in another thread)
log.info('Processed {}/{} securities in queue.'.format(counter+1, len(queuelist)))
# do some stuff to update progress GUI
我無法弄清楚在進行過程中如何使最后幾行與結果配合使用,希望能提出一些建議。
我正在考慮將其全部在Pool
運行,並讓進程將結果添加到Queue
(而不是返回它們),然后在主進程中放置一會兒循環,以便在結果進入時退出隊列-這樣是一個合理的方法嗎? 就像是:
mpqueue = multiprocessing.Queue()
pool = multiprocessing.Pool()
pool.map(generateAnalytics, [queuelist, mpqueue])
while not needQueueLoad: # set by another thread that's monitoring for new data (in the form of a new file that arrives a couple times a day)
while not mpqueue.empty():
dfDay = mpqueue.get()
astore[analyticsTable(symbol)] = dfDay # astore is a pandas store (HDF5). analyticsTable() returns the name of the appropriate table, which gets overwritten
dfLatest.loc[symbol] = dfDay.iloc[-1] # update with the latest results (dfLatest is the latest results for each symbol, which is loaded as a global at startup and periodically saved back to the store in another thread)
log.info('Processed {}/{} securities in queue.'.format(counter+1, len(queuelist)))
# do some stuff to update GUI that shows progress
sleep(0.1)
# do some bookkeeping to see if queue has finished
pool.join()
使用Queue
似乎是一種合理的方法,但有兩個說明。
由於它是從您正在使用GUI的代碼中查找的,因此檢查結果可能最好在超時函數或空閑函數中進行,而不是在while循環中進行。 使用while循環檢查結果將阻止GUI的事件循環。
如果工作進程需要通過隊列將大量數據返回到主進程,則這將增加大量開銷。 您可能要考慮使用共享內存,甚至是中間文件。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.