简体   繁体   中英

futures.wait() or futures.as_completed() blocked while all futures are Completed or Cancelled

I have a bug with futures.as_completed() or futures.wait() that will be blocked indefinitely when all Futures are completed or cancelled.

Here the steps to reproduce:

After having submited Futures with ThreadPoolExecutor.submit() , I'm waiting for my Futures with futures.as_completed() or futures.wait() . In an other thread, I call ThreadPoolExecutor.shutdown() with cancel_futures=True and then in this same process, I'm waiting for Futures to complete with a timeout. The wait will return after the delay passed, with 2 lists: completed Futures and cancelled Futures. There is no more pending Futures. However, the first as_completed() (or wait() ) in the main thread is still blocking.

In the Python documenation , it is stated for return_when=ALL_COMPLETED :

The function will return when all futures finish or are cancelled .

And for as_completed()

Returns [...] futures as they complete (finished or cancelled futures).

Which corresponds to my situation. Is it a bug or am I missing something? I tried to call shutdown() in the same thread, it doesn't change anything.


Code sample:

import signal
import time
from concurrent import futures
from concurrent.futures import Future, ALL_COMPLETED
from concurrent.futures import ThreadPoolExecutor
from typing import Dict, Set


class SubThreads:
    def __init__(self):
        self.running_futures_url: Dict[str, Future] = {}
        self.webpage_crawler_th_pool = ThreadPoolExecutor(2)

    def shutdown(self):
        print("Waiting for lasts URL threads")
        self.webpage_crawler_th_pool.shutdown(wait=False, cancel_futures=True)
        finished_futures, still_running_futures = futures.wait(
            self.running_futures_url.values(), return_when=ALL_COMPLETED, timeout=5,
        )
        print("Shutdown done, remaining threads", len(still_running_futures))

    def crawl_url(self, url):
        print("Crawling webpage", url)
        time.sleep(3)
        print("Webpage crawled", url)
        return "URL Crawled"

    def run(self):
        urls = ['1', '2', '3', '4', '5']
        for url in urls:
            running_th = self.webpage_crawler_th_pool.submit(self.crawl_url, url)
            self.running_futures_url[url] = running_th

        print("Waiting for URLs to be crawled")
        # for _future in futures.as_completed(self.running_futures_url.values()):
        #     print("Future result:", _future.result())  # Will only return and print first 2 started (and completed) Futures
        finished_futures, still_running_futures = futures.wait(
            self.running_futures_url.values(), return_when=ALL_COMPLETED
        )
        print("SubThread finished (never called)", finished_futures, still_running_futures)


sub_thread = SubThreads()


def signal_handler(sig, frame):
    print("Signal caught, exiting ...", sig)
    sub_thread.shutdown()


signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
sub_thread.run()

I would not use shutdown with wait=False .

See the docs:

shutdown(wait=True)

Stops accepting new tasks. It waits for all the running tasks to complete if wait is True.

So since you pass wait=False , your as_completed will wait forever, because the ThreadPoolExecutor will never stop running until you call shutdown with wait=True .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM