ThreadPoolExecutor 和 as_completed：它是否等待所有期貨完成？

Question

我有一個用例，我需要處理幾個“任務”對象。 這些Tasks對象基本上都是web爬蟲。 每當任務完成時，我想盡快重新啟動它，使用相同的參數。

這就是我現在所擁有的：

tasks = create_collection_tasks()

with concurrent.futures.ThreadPoolExecutor(max_workers=len(tasks)) as executor:
    # Maps the future to a collection task. The mapping is in this direction bc
    # later on we can only iterate over the completed futures.
    futures_to_tasks = {
        executor.submit(task.callback, task.data): task for task in tasks
    }

    while futures_to_tasks:
        for future in concurrent.futures.as_completed(futures_to_tasks):
            # If the future raised an exception, calling result() will re-raise
            try:
                future.result()
            except Exception:
                logger.exception("Collection task failed")

            # When the future is done (completed or crashed/cancelled), put it back
            # in the 'queue' to re-run it
            task = futures_to_tasks.pop(future)
            futures_to_tasks[executor.submit(task.callback, task.data)] = task

我已經閱讀了as_completed的文檔： https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.as_completed ，但是我仍然不清楚它的行為。 現在，我不知道as_completed是否在調用它時對futures_to_tasks中的期貨進行“快照”，並在完成循環之前等待所有這些期貨完成，或者它是否只是通過期貨並產生那些已經完成的，無需等待其他的。 我寧願有以后的行為。

你能幫我一把嗎？

Answer 1

讓我們假設您希望無限期地運行（這似乎暗示在您的問題中）。 您可以考慮這種模式（解釋遵循代碼）：

from concurrent.futures import ThreadPoolExecutor
from time import sleep

def trun(v):
    print(v)
    sleep(0.5)

with ThreadPoolExecutor() as executor:
    tasks = ['a', 'b', 'c', 'd', 'e']
    futures = dict()
    while True:
        sleep(0.5)
        for i, t in enumerate(tasks):
            if (f := futures.get(i, None)) is not None:
                if not f.running():
                    print(f'Thread {i} has finished. Awaiting result')
                    f.result()
                    print(f'Restarting thread {i}')
                else:
                    print(f'Thread {i} still running')
                    continue
            else:
                print(f'Starting new thread {i}')
            futures[i] = executor.submit(trun, t)

我們有許多由任務列表中的字符串表示的任務。

我們構建了一個字典，其鍵將是任務的索引，關聯的值將是 Future object。 在這種特殊情況下，我們可以使用列表中的值作為鍵，但這在現實世界中可能不合適。

有硬編碼的延遲只是為了模擬真實的工作。 您需要考慮一個線程實際上會運行多長時間，以確定您是否需要延遲生產（以避免在while循環中顛簸）。

枚舉任務並檢查我們的字典中是否有對任務 ID（列表索引）的引用。 如果有，請檢查它是否仍在運行。 如果有，獲取結果。

其他一切都應該是不言自明的。

此代碼是可運行的，但您需要手動停止它（Ctrl-C 是理想的）

您可以考慮使用某種形式的哨兵，該哨兵將在線程中設置以指示完成，並且可以在主循環中進行檢查。 但是，如果處理得天真，則存在潛在的競爭條件

ThreadPoolExecutor 和 as_completed：它是否等待所有期貨完成？

問題描述

1 個解決方案

解決方案1
0 2022-01-29 13:37:56

ThreadPoolExecutor 和 as_completed：它是否等待所有期貨完成？

問題描述

1 個解決方案

解決方案1 0 2022-01-29 13:37:56

解決方案1
0 2022-01-29 13:37:56