Python 多处理进度方法

Question

I've been busy writing my first multiprocessing code and it works, yay.我一直忙于编写我的第一个多处理代码并且它有效，耶。 However, now I would like some feedback of the progress and I'm not sure what the best approach would be.但是，现在我想要一些关于进展的反馈，我不确定最好的方法是什么。

What my code (see below) does in short:我的代码（见下文）简而言之：

A target directory is scanned for mp4 files扫描目标目录中的 mp4 文件
Each file is analysed by a separate process, the process saves a result (an image)每个文件由一个单独的进程分析，该进程保存一个结果（图像）

What I'm looking for could be:我正在寻找的可能是：

Simple简单的

Each time a process finishes a file it sends a 'finished' message每次进程完成文件时，它都会发送一条“完成”消息
The main code keeps count of how many files have finished主代码记录有多少文件已经完成

Fancy想要

Core 0  processing file 20 of 317 ||||||____ 60% completed
Core 1  processing file 21 of 317 |||||||||_ 90% completed
...
Core 7  processing file 18 of 317 ||________ 20% completed

I read all kinds of info about queues, pools, tqdm and I'm not sure which way to go. Could anyone point to an approach that would work in this case?我阅读了有关队列、池、tqdm 的各种信息，但我不确定到 go 的方法。有人能指出一种在这种情况下可行的方法吗？

Thanks in advance!提前致谢！

EDIT: Changed my code that starts the processes as suggested by gsb22编辑：按照 gsb22 的建议更改了启动进程的代码

My code:我的代码：

# file operations
import os
import glob
# Multiprocessing
from multiprocessing import Process
# Motion detection
import cv2


# >>> Enter directory to scan as target directory
targetDirectory = "E:\Projects\Programming\Python\OpenCV\\videofiles"

def get_videofiles(target_directory):

    # Find all video files in directory and subdirectories and put them in a list
    videofiles = glob.glob(target_directory + '/**/*.mp4', recursive=True)
    # Return the list
    return videofiles


def process_file(videofile):

    '''
    What happens inside this function:
    - The video is processed and analysed using openCV
    - The result (an image) is saved to the results folder
    - Once this function receives the videofile it completes
      without the need to return anything to the main program
    '''

    # The processing code is more complex than this code below, this is just a test
    cap = cv2.VideoCapture(videofile)

    for i in range(10):
        succes, frame = cap.read()

        # cv2.imwrite('{}/_Results/{}_result{}.jpg'.format(targetDirectory, os.path.basename(videofile), i), frame)

        if succes:
            try:
                cv2.imwrite('{}/_Results/{}_result_{}.jpg'.format(targetDirectory, os.path.basename(videofile), i), frame)
            except:
                print('something went wrong')


if __name__ == "__main__":

    # Create directory to save results if it doesn't exist
    if not os.path.exists(targetDirectory + '/_Results'):
        os.makedirs(targetDirectory + '/_Results')

    # Get a list of all video files in the target directory
    all_files = get_videofiles(targetDirectory)

    print(f'{len(all_files)} video files found')

    # Create list of jobs (processes)
    jobs = []

    # Create and start processes
    for file in all_files:
        proc = Process(target=process_file, args=(file,))
        jobs.append(proc)

    for job in jobs:
        job.start()

    for job in jobs:
        job.join()

    # TODO: Print some form of progress feedback

    print('Finished :)')

Answer 1

I read all kinds of info about queues, pools, tqdm and I'm not sure which way to go. Could anyone point to an approach that would work in this case?我阅读了有关队列、池、tqdm 的各种信息，但我不确定到 go 的方法。有人能指出一种在这种情况下可行的方法吗？

Here's a very simple way to get progress indication at minimal cost:这是以最低成本获取进度指示的非常简单的方法：

from multiprocessing.pool import Pool
from random import randint
from time import sleep

from tqdm import tqdm


def process(fn) -> bool:
    sleep(randint(1, 3))
    return randint(0, 100) < 70


files = [f"file-{i}.mp4" for i in range(20)]

success = []
failed = []
NPROC = 5
pool = Pool(NPROC)


for status, fn in tqdm(zip(pool.imap(process, files), files), total=len(files)):
    if status:
        success.append(fn)
    else:
        failed.append(fn)

print(f"{len(success)} succeeded and {len(failed)} failed")

Some comments:一些评论：

tqdm is a 3rd-party library which implements progressbars extremely well. tqdm 是一个非常好地实现进度条的第 3 方库。 There are others.还有其他人。 pip install tqdm . pip install tqdm 。
we use a pool (there's almost never a reason to manage processes yourself for simple things like this) of NPROC processes.我们使用NPROC进程的池（几乎从来没有理由为像这样的简单事情自己管理进程）。 We let the pool handle iterating our process function over the input data.我们让池处理在输入数据上迭代我们的流程 function。
we signal state by having the function return a boolean (in this example we choose randomly, weighting in favour of success).我们通过让 function 返回 boolean 来发出 state 信号（在本例中我们随机选择，权重有利于成功）。 We don't return the filename, although we could, because it would have to be serialised and sent from the subprocess, and that's unnecessary overhead.我们不返回文件名，尽管我们可以，因为它必须被序列化并从子进程发送，这是不必要的开销。
we use Pool.imap , which returns an iterator which keeps the same order as the iterable we pass in. So we can use zip to iterate files directly.我们使用Pool.imap ，它返回一个迭代器，它与我们传入的 iterable 保持相同的顺序。所以我们可以使用zip直接迭代files 。 Since we use an iterator with unknown size, tqdm needs to be told how long it is.因为我们使用了一个大小未知的迭代器，所以需要告诉tqdm它有多长。 (We could have used pool.map , but there's no need to commit the ram---although for one bool it probably makes no difference.) （我们本可以使用pool.map ，但是没有必要提交 ram——尽管对于一个 bool 来说它可能没有什么区别。）

I've deliberately written this as a kind of recipe.我故意把它写成一种食谱。 You can do a lot with multiprocessing just by using the high-level drop in paradigms, and Pool.[i]map is one of the most useful.只需使用范式中的高级下降，您就可以对多处理做很多事情，而Pool.[i]map是最有用的方法之一。

References参考

https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool https://tqdm.github.io/ https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool https://tqdm.github.io/

Python 多处理进度方法

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-04-18 13:24:07

References参考

Python 多处理进度方法

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-04-18 13:24:07

References参考

解决方案1
1 已采纳 2022-04-18 13:24:07