OpenCV 代码片段在 Python 多处理进程中运行速度较慢

Question

我正在使用多处理进行一些测试以并行化人脸检测和识别，我遇到了一个奇怪的行为，其中 detectMultiScale()（执行人脸检测）在子进程中的运行速度比在父进程中慢（只是调用函数）。

因此，我编写了下面的代码，其中将 10 张图像排入队列，然后使用以下两种方法之一依次执行面部检测：仅调用检测函数或在单个新进程中运行它。 对于每个 detectMultiScale() 调用，都会打印执行时间。 执行此代码后，第一种方法中的每个调用平均为 0.22 秒，第二种方法为 0.54 秒。 此外，在第二种方法中处理 10 张图像的总时间也更长。

我不知道为什么相同的代码片段在新进程中运行得更慢。 如果总时间更长我会理解（考虑到设置新进程的开销），但这我不明白。 作为记录，我在 Raspberry Pi 3B+ 中运行它。

import cv2
import multiprocessing
from time import time, sleep

def detect(face_cascade, img_queue, bnd_queue):
    while True:
        image = img_queue.get()
        if image is not None:
            gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            ti = time()
            ########################################
            faces = face_cascade.detectMultiScale(
                                gray_image,
                                scaleFactor=1.1,
                                minNeighbors=3,
                                minSize=(130, 130))
            ########################################
            tf = time()
            print('det time: ' + str(tf-ti))
                            
            if len(faces) > 0:
                max_bounds = (0,0,0,0)
                max_size = 0
                for (x,y,w,h) in faces:
                     if w*h > max_size:
                         max_size = w*h
                         max_bounds = (x,y,w,h)
            img_queue.task_done()
            bnd_queue.put('bound')
        else:
            img_queue.task_done()
            break


face_cascade = cv2.CascadeClassifier('../lbpcascade_frontalface_improved.xml')
cam = cv2.VideoCapture(0)
cam.set(cv2.CAP_PROP_FRAME_WIDTH, 2592)
cam.set(cv2.CAP_PROP_FRAME_HEIGHT, 1944)
cam.set(cv2.CAP_PROP_BUFFERSIZE, 1)

img_queue = multiprocessing.JoinableQueue()

i = 0
while i < 10:
    is_there_frame, image = cam.read()
    if is_there_frame:
        image = image[0:1944, 864:1728]
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        img_queue.put(image)
        i += 1

bnd_queue = multiprocessing.JoinableQueue()
num_process = 1

ti = time()
# MULTIPROCESSING PROCESS APPROACH
for _ in range(num_process):
    p = multiprocessing.Process(target=detect, args=(face_cascade, img_queue, bnd_queue))
    p.start()

for _ in range(num_process):
    img_queue.put(None)
#     
# FUNCTION CALL APPROACH
#img_queue.put(None)
#while not img_queue.empty():
#    detect(face_cascade, img_queue, bnd_queue)

img_queue.join()
tf = time()

print('TOTAL TIME: ' + str(tf-ti))

while not bnd_queue.empty():
    bound = bnd_queue.get()
    if bound != 'bound':
        print('ERROR')
    bnd_queue.task_done()

Answer 1

我遇到了同样的问题，我认为原因是任务有点 I/O 限制以及多处理本身造成的开销。 你也可以在这里阅读这篇文章https://www.pyimagesearch.com/2019/09/09/multiprocessing-with-opencv-and-python/你特别提到的用detectMultiScale()方法提到的问题和我的一样。 我也尝试过使用序列化并使变量全局化和类级别，但没有任何帮助..

OpenCV 代码片段在 Python 多处理进程中运行速度较慢

问题描述

1 个解决方案

解决方案1
1 2021-08-11 21:49:25

OpenCV 代码片段在 Python 多处理进程中运行速度较慢

问题描述

1 个解决方案

解决方案1 1 2021-08-11 21:49:25

解决方案1
1 2021-08-11 21:49:25