简体   繁体   中英

Python threading: make the main thread report the progress

I run some jobs in parallel, which can sometime take a long time, so I want the main thread to report on the progress. For example, each hour.

Below is the simplified version of what I came up with. The code will run test_function in 2 threads with arguments from input_arguments . Every 5 seconds it will print % of the jobs finished.

import threading
import queue
import time


def test_function(x):
    time.sleep(4)
    print("Finished ", x)


num_processes = 2
input_arguments = range(10)

# Define a worker which will continuously execute function taking input parameters from the queue
def worker():
    while True:
        x = q.get()
        if x is None:
            break
        test_function(x)
        q.task_done()

# Initialize queue and the threads
q = queue.Queue()
threads = []
for i in range(num_processes):
    t = threading.Thread(target=worker)
    t.start()
    threads.append(t)

# Create a queue of input parameters for function
for item in input_arguments:
    q.put(item)

# Report progress every 5 seconds
report_progress(q)

# stop workers
for i in range(num_processes):
    q.put(None)
for t in threads:
    t.join()

Where report_progress is defined as following

def report_progress(q):
    qsize_init = q.qsize()
    while not q.empty():
        time.sleep(5)
        portion_finished = 1 - q.qsize() / qsize_init
        print("run_parallel: {:.1%} jobs are finished".format(portion_finished))

However, I want to report the progress every hour instead of 5 seconds, and if all jobs are finished, the program might just be idle for many minutes.

Another possibility is to define report_progress differently:

def report_progress(q):
    qsize_init = q.qsize()
    time_start = time.time()
    while not q.empty():
        current_time = time.time()
        if current_time - time_start > 5:
            portion_finished = 1 - q.qsize() / qsize_init
            print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
            time_start = time.time()

I am worried that constantly checking this condition will drain CPU resources, small portion, but on a scale of hours it could be a lot.

Is there a standard way of handling this?

Python: 3.6

For now I will use a simple solution, suggested in the comments by @Andriy Maletsky.

Main thread will check every few seconds if the q is not empty yet, and it will print a progress message if it has past more than 1 hour since the last report.

time_between_reports = 3600
time_between_checks = 5
def report_progress_until_finished(q):
    qsize_init = q.qsize()
    last_report_time = time.time()
    while not q.empty():
        time_elapsed = time.time() - last_report_time
        if time_elapsed > time_between_reports:
            portion_finished = 1 - q.qsize() / qsize_init
            print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
            last_report_time = time.time()
        time.sleep(time_between_checks)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM