Python's `ThreadPoolExecutor` does not utilize number of executors

Question

I have the following Python code:

import sys
import os
from concurrent.futures import ThreadPoolExecutor

VIDEOS = [   # A list of 9 videos
    {... bla bla ...},
    {... bla bla ...},
    {... bla bla ...}
]

SAMPLING_FREQUENCIES = [1, 2.4, 3.71, ... , 14.3] # A list of 8 frequencies

def process_video(video_obj, sampling_frequency, process_work_dir):
    os.makedirs(process_work_dir)
    # ... do some heavy processing ...

if __name__ == '__main__':
    output_work_dir = sys.argv[1]
    os.makedirs(output_work_dir)
    executor = ThreadPoolExecutor(max_workers=4)
    for video_obj in VIDEOS:
        for samp_fps in SAMPLING_FREQUENCIES:  # Totally 8 * 9 threads should be waiting to be executed
            work_dir = os.path.join(output_work_dir, os.path.basename(video_obj['path']), str(samp_fps))            
            executor.submit(lambda : process_video(video_obj, samp_fps, work_dir))

What I observe is, that at first, as expected, 4 threads are running. I can validate it by making sure that exactly 4 work directories are being created.

Then, the first thread finished its work, and as expected, another thread that had been waiting, started running.

The problem is, that although I created 72 threads (9 videos times 8 sampling frequencies), no threads were executed anymore. When the 5th thread finished its work, the application terminated.

Is it a bug, or do I have a problem with understanding the ThreadPoolExecutor API?

I use Python 3.6.5

Answer 1

If you want to executor to complete all tasks then you might want to declare it in a with block. Here the print statement I added would be printed after all threads are complete.

~~The problem your seeing is the executor will submit things and not wait, letting your code die when it hits the EOF killing tasks waiting as the interpreter shuts down.~~
*This is not the case please ignore

Also note this line:
executor.submit(lambda : process_video(video_obj, samp_fps, work_dir)) This seems like the lambda would be the result of the process_video method which means it would run in the main thread. I think the usage is clearer and cleaner if specified as the docs propose.
executor.submit(process_video, video_obj, samp_fps, work_dir)

Try this:

if __name__ == '__main__':
    output_work_dir = sys.argv[1]
    os.makedirs(output_work_dir)
    with ThreadPoolExecutor(max_workers=4) as executor:
        for video_obj in VIDEOS:
            for samp_fps in SAMPLING_FREQUENCIES:  # Totally 8 * 9 threads should be waiting to be executed
                work_dir = os.path.join(output_work_dir, os.path.basename(video_obj['path']), str(samp_fps))            
                executor.submit(process_video, video_obj, samp_fps, work_dir)
    print("Hello, all done with thread work!")

Python's `ThreadPoolExecutor` does not utilize number of executors

Question

1 answers

solution1
1 ACCPTED 2018-06-25 14:35:17

Python's `ThreadPoolExecutor` does not utilize number of executors

Question

1 answers

solution1 1 ACCPTED 2018-06-25 14:35:17

solution1
1 ACCPTED 2018-06-25 14:35:17