简体   繁体   中英

How can I change this code to make the progress bars appear for each file and the iteration be each loop?

I am struggling with getting tqdm 's progress bar to stay and update as opposed to write to a new line. Note: I am using multiprocessing to parallelize my code, and tqdm is inside the function I am parallelizing.

I added a print statement so the files will all appear in my terminal when running the program. Reproducible example below:

import multiprocessing
import time

from tqdm import tqdm
from joblib import Parallel, delayed


def run_file_analysis(text):
    cool = []
    for i in tqdm(range(0, 10), position = 0, leave = True, desc = f'Text : {text}'):
        print('')
        cool.append(i)
        time.sleep(1)

num_cores = multiprocessing.cpu_count()
ls = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

if __name__ == "__main__":
    processed_list = Parallel(n_jobs=num_cores)(delayed(run_file_analysis)(i) for i in ls)

Current output: 在此处输入图片说明

The desired output would be the ten text objects - 1, 2, 3, ... , 10 and a corresponding updating progress bar for each. Not 100 different ones. I have tried following many stackoverflow questions relating to the topic of tqdm and multiprocessing integration, but none of them are as straightforward as I would like them to be. Any help would be appreciated.

As already discussed in the comments, you don't want to add an extra new line with the print statement. Instead you want to use the position argument in tqdm. The use case for different threads is even mentioned in the docs .

position : int, optional
  Specify the line offset to print this bar (starting from 0)
  Automatic if unspecified. Useful to manage multiple bars at once (eg, from threads).

Currently, this argument is set to 0, so it will start the progress bar each time new. Instead you want to use the number of the thread. Because of simplicity, you can convert the given text to an integer and use this. But this is not recommended for production.

import multiprocessing
import time

from tqdm import tqdm
from joblib import Parallel, delayed


def run_file_analysis(text):
    cool = []
    for i in tqdm(range(0, 10), position=int(text), leave=True, desc = f'Text : {text}'):
        cool.append(i)
        time.sleep(1)

num_cores = multiprocessing.cpu_count()
ls = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

if __name__ == "__main__":
    processed_list = Parallel(n_jobs=num_cores)(delayed(run_file_analysis)(i) for i in ls)

If the text's can not directly converted to integer, 'enumerate' can be used an the index can be passed to the function.

import multiprocessing
import time

from tqdm import tqdm
from joblib import Parallel, delayed


def run_file_analysis(text, job_number):
    cool = []
    for i in tqdm(range(0, 10), position=job_number, leave=True, desc = f'Text : {text}'):
        cool.append(i)
        time.sleep(1)

num_cores = multiprocessing.cpu_count()
ls = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

if __name__ == "__main__":
    processed_list = Parallel(n_jobs=num_cores)(delayed(run_file_analysis)(text, i) for i, text in enumerate(ls))

Edit:

Some raceconditions can be reduced by setting prefer='threads' to the Parallel constructor:

if __name__ == "__main__":
    processed_list = Parallel(n_jobs=num_cores, prefer="threads")(delayed(run_file_analysis)(text, i) for i, text in enumerate(ls))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM