简体   繁体   中英

Python's `ThreadPoolExecutor` does not utilize number of executors

I have the following Python code:

import sys
import os
from concurrent.futures import ThreadPoolExecutor

VIDEOS = [   # A list of 9 videos
    {... bla bla ...},
    {... bla bla ...},
    {... bla bla ...}
]

SAMPLING_FREQUENCIES = [1, 2.4, 3.71, ... , 14.3] # A list of 8 frequencies

def process_video(video_obj, sampling_frequency, process_work_dir):
    os.makedirs(process_work_dir)
    # ... do some heavy processing ...

if __name__ == '__main__':
    output_work_dir = sys.argv[1]
    os.makedirs(output_work_dir)
    executor = ThreadPoolExecutor(max_workers=4)
    for video_obj in VIDEOS:
        for samp_fps in SAMPLING_FREQUENCIES:  # Totally 8 * 9 threads should be waiting to be executed
            work_dir = os.path.join(output_work_dir, os.path.basename(video_obj['path']), str(samp_fps))            
            executor.submit(lambda : process_video(video_obj, samp_fps, work_dir))

What I observe is, that at first, as expected, 4 threads are running. I can validate it by making sure that exactly 4 work directories are being created.

Then, the first thread finished its work, and as expected, another thread that had been waiting, started running.

The problem is, that although I created 72 threads (9 videos times 8 sampling frequencies), no threads were executed anymore. When the 5th thread finished its work, the application terminated.

Is it a bug, or do I have a problem with understanding the ThreadPoolExecutor API?

I use Python 3.6.5

If you want to executor to complete all tasks then you might want to declare it in a with block. Here the print statement I added would be printed after all threads are complete.

The problem your seeing is the executor will submit things and not wait, letting your code die when it hits the EOF killing tasks waiting as the interpreter shuts down.
*This is not the case please ignore

Also note this line:
executor.submit(lambda : process_video(video_obj, samp_fps, work_dir)) This seems like the lambda would be the result of the process_video method which means it would run in the main thread. I think the usage is clearer and cleaner if specified as the docs propose.
executor.submit(process_video, video_obj, samp_fps, work_dir)

Try this:

if __name__ == '__main__':
    output_work_dir = sys.argv[1]
    os.makedirs(output_work_dir)
    with ThreadPoolExecutor(max_workers=4) as executor:
        for video_obj in VIDEOS:
            for samp_fps in SAMPLING_FREQUENCIES:  # Totally 8 * 9 threads should be waiting to be executed
                work_dir = os.path.join(output_work_dir, os.path.basename(video_obj['path']), str(samp_fps))            
                executor.submit(process_video, video_obj, samp_fps, work_dir)
    print("Hello, all done with thread work!")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM