简体   繁体   English

如何在Python中优化多处理

[英]How to optimize multiprocessing in Python

EDIT: I've had questions about what the video stream is, so I will offer more clarity. 编辑:我对视频流有什么疑问,所以我会提供更多的清晰度。 The stream is a live video feed from my webcam, accessed via OpenCV. 该流是我网络摄像头的实时视频,通过OpenCV访问。 I get each frame as the camera reads it, and send it to a separate process for processing. 我在摄像机读取每个帧时得到它,并将其发送到一个单独的进程进行处理。 The process returns text based on computations done on the image. 该过程基于对图像进行的计算返回文本。 The text is then displayed onto the image. 然后将文本显示在图像上。 I need to display the stream in realtime, and it is ok if there is a lag between the text and the video being shown (ie if the text was applicable to a previous frame, that's ok). 我需要实时显示流,如果文本和正在显示的视频之间存在延迟(即如果文本适用于前一帧,那就没问题),这是可以的。

Perhaps an easier way to think of this is that I'm doing image recognition on what the webcam sees. 也许更容易想到这一点的方法是我正在对网络摄像头看到的图像进行识别。 I send one frame at a time to a separate process to do recognition analysis on the frame, and send the text back to be put as a caption on the live feed. 我一次发送一个帧到一个单独的进程来对帧进行识别分析,并将文本发送回作为实时源的标题。 Obviously the processing takes more time than simply grabbing frames from the webcam and showing them, so if there is a delay in what the caption is and what the webcam feed shows, that's acceptable and expected. 显然,处理过程比简单地从网络摄像头抓取帧并显示它们需要更多的时间,因此如果标题是什么以及网络摄像头馈送显示的延迟,这是可接受的和预期的。

What's happening now is that the live video I'm displaying is lagging due to the other processes (when I don't send frames to the process for computing, there is no lag). 现在发生的事情是我正在显示的实时视频由于其他进程而滞后(当我不向计算过程发送帧时,没有滞后)。 I've also ensured only one frame is enqueued at a time so avoid overloading the queue and causing lag. 我还确保一次只排队一帧,这样可以避免队列过载并导致延迟。 I've updated the code below to reflect this detail. 我已经更新了下面的代码以反映这个细节。

I'm using the multiprocessing module in python to help speed up my main program. 我在python中使用多处理模块来帮助加快我的主程序。 However I believe I might be doing something incorrectly, as I don't think the computations are happening quite in parallel. 但是我相信我可能会做错误的事情,因为我认为计算并不是完全并行发生的。

I want my program to read in images from a video stream in the main process, and pass on the frames to two child processes that do computations on them and send text back (containing the results of the computations) to the main process. 我希望我的程序从主进程中的视频流中读取图像,并将帧传递给两个对其进行计算的子进程,并将文本发回(包含计算结果)到主进程。

However, the main process seems to lag when I use multiprocessing, running about half as fast as without it, leading me to believe that the processes aren't running completely in parallel. 然而,当我使用多处理时,主要过程似乎滞后,运行速度只有没有它的一半,导致我认为进程没有完全并行运行。

After doing some research, I surmised that the lag may have been due to communicating between the processes using a queue (passing an image from the main to the child, and passing back text from child to main). 在做了一些研究之后,我猜测滞后可能是由于使用队列在进程之间进行通信(将图像从main传递给子进程,并将文本从child传回main)。

However I commented out the computational step and just had the main process pass an image and the child return blank text, and in this case, the main process did not slow down at all. 然而,我评论了计算步骤,只是让主进程传递一个图像,并且子进程返回空白文本,在这种情况下,主进程根本没有减速。 It ran at full speed. 它全速奔跑。

Thus I believe that either 因此,我相信

1) I am not optimally using multiprocessing 1)我没有最佳地使用多处理

OR 要么

2) These processes cannot truly be run in parallel (I would understand a little lag, but it's slowing the main process down in half). 2)这些过程不能真正并行运行(我会理解有点滞后,但它会使主要过程减慢一半)。

Here's a outline of my code. 这是我的代码大纲。 There is only one consumer instead of 2, but both consumers are nearly identical. 只有一个消费者而不是两个消费者,但两个消费者几乎完全相同。 If anyone could offer guidance, I would appreciate it. 如果有人能提供指导,我将不胜感激。

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue
        #other initialization stuff

    def run(self):
        while True:
            image = self.task_queue.get()
            #Do computations on image
            self.result_queue.put("text")

        return

import cv2

tasks = multiprocessing.Queue()
results = multiprocessing.Queue()
consumer = Consumer(tasks,results)
consumer.start()

#Creating window and starting video capturer from camera
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
    rval, frame = vc.read()
else:
    rval = False

while rval:
    if tasks.empty():
       tasks.put(image)
    else:
       text = tasks.get()
       #Add text to frame
       cv2.putText(frame,text)

    #Showing the frame with all the applied modifications
    cv2.imshow("preview", frame)

    #Getting next frame from camera
    rval, frame = vc.read()

I want my program to read in images from a video stream in the main process 我希望我的程序在主进程中从视频流中读取图像

In producer/consumer implementations, which is what you have above, the producer, what puts tasks into the queue to be executed by the consumers, needs to be separate from the main/controlling process so that it can add tasks in parallel with the main process reading output from results queue. 在生产者/消费者实现中,这就是您所拥有的,生产者,将任务放入队列以由消费者执行的内容,需要与主/控制过程分开,以便它可以与主要的并行添加任务进程从结果队列中读取输出。

Try the following. 请尝试以下方法。 Have added a sleep in the consumer processes to simulate processing and added a second consumer to show they are being run in parallel. 在消费者流程中添加了睡眠以模拟处理并添加了第二个消费者以显示它们并行运行。

It would also be a good idea to limit the size of the task queue to avoid having it run away with memory usage if processing cannot keep up with input stream. 如果处理无法跟上输入流,那么限制任务队列的大小以避免它因内存使用而耗尽也是一个好主意。 Can specify a size when calling Queue(<size>) . 可以在调用Queue(<size>)时指定大小。 If the queue is at that size, calls to .put will block until the queue is not full. 如果队列处于该大小,则对.put调用将阻塞,直到队列未满。

import time
import multiprocessing
import cv2

class ImageProcessor(multiprocessing.Process):

    def __init__(self, tasks_q, results_q):
        multiprocessing.Process.__init__(self)
        self.tasks_q = tasks_q
        self.results_q = results_q

    def run(self):
        while True:
            image = self.tasks_q.get()
            # Do computations on image
            time.sleep(1)
            # Display the result on stream
            self.results_q.put("text")

# Tasks queue with size 1 - only want one image queued
# for processing. 
# Queue size should therefore match number of processes
tasks_q, results_q = multiprocessing.Queue(1), multiprocessing.Queue()
processor = ImageProcessor(tasks_q, results_q)
processor.start()

def capture_display_video(vc):
    rval, frame = vc.read()
    while rval:    
        image = frame.get_image()
        if not tasks_q.full():
            tasks_q.put(image)
        if not results_q.empty():
            text = results_q.get()
            cv2.putText(frame, text)
        cv2.imshow("preview", frame)
        rval, frame = vc.read()

cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
if not vc.isOpened():
    raise Exception("Cannot capture video")

capture_display_video(vc)
processor.terminate()

(Updated solution based on you last code sample) (基于您最后一个代码示例的更新解决方案)

It will get images from the stream, put one in the task queue as soon as it is available, and display the last image with the last text. 它将从流中获取图像,一旦可用就将其放入任务队列中,并显示最后一个文本的最后一个图像。

I put some active loop in there to simulate a processing longer than the time between two images. 我在那里放了一些活动循环来模拟比两个图像之间的时间更长的处理。 I means that the text displayed is not necessarily the one belonging to the image, but the last one computed. 我的意思是显示的文本不一定是属于图像的文本,而是计算的最后一个文本。 If the processing is fast enough, the shift between image and text should be limited. 如果处理足够快,则应限制图像和文本之间的转换。

Note that I force calls to get/put with some try/catch. 请注意,我通过一些try / catch强制调用get / put。 Per the doc, empty and full are not 100% accurate. 根据文档,空和完整不是100%准确。

import cv2
import multiprocessing
import random
from time import sleep

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue
        # Other initialization stuff

    def run(self):
        while True:
            frameNum, frameData = self.task_queue.get()
            # Do computations on image
            # Simulate a processing longer than image fetching
            m = random.randint(0, 1000000)
            while m >= 0:
                m -= 1
            # Put result in queue
            self.result_queue.put("result from image " + str(frameNum))

        return

# No more than one pending task
tasks = multiprocessing.Queue(1)
results = multiprocessing.Queue()
# Init and start consumer
consumer = Consumer(tasks,results)
consumer.start()

#Creating window and starting video capturer from camera
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
    rval, frame = vc.read()
    frame = cv2.resize(frame, (0,0), fx=0.5, fy=0.5)
else:
    rval = False

# Dummy int to represent frame number for display
frameNum = 0
# String for result
text = None

font = cv2.FONT_HERSHEY_SIMPLEX

# Process loop
while rval:
    # Grab image from stream
    frameNum += 1
    # Put image in task queue if empty
    try:
        tasks.put_nowait((frameNum, frame))
    except:
        pass
    # Get result if ready
    try:
        # Use this if processing is fast enough
        # text = results.get(timeout=0.4)
        # Use this to prefer smooth display over frame/text shift
        text = results.get_nowait()
    except:
        pass

    # Add last available text to last image and display
    print("display:", frameNum, "|", text)
    # Showing the frame with all the applied modifications
    cv2.putText(frame,text,(10,25), font, 1,(255,0,0),2)
    cv2.imshow("preview", frame)
    # Getting next frame from camera
    rval, frame = vc.read()
    # Optional image resize
    # frame = cv2.resize(frame, (0,0), fx=0.5, fy=0.5)

Here is some output, you can see the delay between image and result, and the result catching back. 这是一些输出,你可以看到图像和结果之间的延迟,以及结果回收。

> ('display:', 493, '|', 'result from image 483')
> ('display:', 494, '|', 'result from image 483')
> ('display:', 495, '|', 'result from image 489')
> ('display:', 496, '|', 'result from image 490')
> ('display:', 497, '|', 'result from image 495')
> ('display:', 498, '|', 'result from image 496')

Here's a more elegant (IMHO) solution that utilizes multiple processes for processing your frames: 这是一个更优雅(恕我直言)的解决方案,利用多个流程处理您的帧:

def process_image(args):
    image, frame = args
    #Do computations on image
    return "text", frame

import cv2

pool = multiprocessing.Pool()

def image_source():
    #Creating window and starting video capturer from camera
    cv2.namedWindow("preview")
    vc = cv2.VideoCapture(0)
    #Try to get the first frame
    if vc.isOpened():
        rval, frame = vc.read()
    else:
        rval = False

    while rval:
        yield image, frame
        # Getting next frame from camera
        rval, frame = vc.read()

for (text, frame) in pool.imap(process_image, image_source()):
    # Add text to frame
    cv2.putText(frame, text)
    # Showing the frame with all the applied modifications
    cv2.imshow("preview", frame)

Pool.imap should allow you to iterate through the pool's results while it's still processing other images from your cam. Pool.imap应该允许您迭代池的结果,同时它仍然处理来自您的凸轮的其他图像。

Let me suggest you a slightly different approach, with a lot less "red tape". 让我建议你采用一种稍微不同的方法,少用“繁文缛节”。 I think the main problem is that you are overlooking the main way to communicate with a process: it is through its arguments and return values. 我认为主要的问题是你忽略了与进程通信的主要方式:它是通过它的参数和返回值。 If you can send the frame data as an argument, there is no need of queues or pipes or other methods. 如果您可以将帧数据作为参数发送,则不需要队列或管道或其他方法。

import time
from multiprocessing import Pool

def process_frame(frame_id, frame_data):
    # this function simulates the processing of the frame.
    # I used a longer sleep thinking that it takes longer
    # and therefore the reason of parallel processing.
    print("..... got frame {}".format(frame_id))
    time.sleep(.5)
    char = frame_data[frame_id]
    count = frame_data.count(char)
    return frame_id, char, count

def process_result(res):
    # this function simulates the function that would receive
    # the result from analyzing the frame, and do what is
    # appropriate, like printing, making a dict, saving to file, etc.
    # this function is called back when the result is ready.
    frame_id, char, count = res
    print("in frame {}".format(frame_id), \
           ", character '{}' appears {} times.".format(
                        chr(char), count))



if __name__ == '__main__':

    pool = Pool(4)
    # in my laptop I got these times:
    # workers, time
    #   1     10.14
    #   2      5.22
    #   4      2.91
    #   8      2.61 # no further improvement after 4 workers.
                    # your case may be different though.

    from datetime import datetime as dt
    t0 = dt.now()

    for i in range(20):   # I limited this loop to simulate 20 frames
                          # but it could be a continuous stream,
                          # that when finishes should execute the
                          # close() and join() methods to finish
                          # gathering all the results.


        # The following lines simulate the video streaming and
        # your selecting the frames that you need to analyze and
        # send to the function process_frame.
        time.sleep(0.1)
        frame_id = i
        frame_data = b'a bunch of binary data representing your frame'

        pool.apply_async(  process_frame,                #func
                           (frame_id, frame_data),       #args
                           callback=process_result       #return value
                        )

    pool.close()
    pool.join()

    print(dt.now() - t0)

I think that this simpler approach would be enough for your program. 我认为这种简单的方法对你的程序来说已经足够了。 No need of using classes or queues. 无需使用类或队列。

After reading your comment to my previous answer, I understood your problem a 'bit' better. 在阅读您对我之前回答的评论之后,我更了解您的问题。 I would like to have more information about your code/problem. 我想了解有关您的代码/问题的更多信息。 Anyway, and because this code is significantly different than my previous answer, I decided to provide another answer. 无论如何,因为这段代码与我以前的答案明显不同,我决定提供另一个答案。 I won't comment the code too much though, because you can follow it from my previous answer. 我不会过多地评论代码,因为你可以从我之前的答案中遵循它。 I will use text instead of images, just to simulate the process. 我将使用文本而不是图像,只是为了模拟过程。

The following code prints a letter out of "lorem ipsum", selecting one out of 6 letters (frames). 以下代码打印出“lorem ipsum”中的一个字母,从6个字母(帧)中选择一个。 Because there is a lag, we need a buffer that I implemented with a deque. 因为存在延迟,我们需要一个用deque实现的缓冲区。 After the buffer has advanced, the displaying of the frame and caption are in sync. 缓冲区提前后,帧和标题的显示是同步的。

I don't know how often you tag a frame, or how much it really takes to process it, but you can have an educated guess with this code by playing with some of the variables. 我不知道你对框架进行标记的频率,或者处理框架的实际程度,但你可以通过使用一些变量对这些代码进行有根据的猜测。

import time
import random
random.seed(1250)
from multiprocessing import Pool, Manager
from collections import deque


def display_stream(stream, pool, queue, buff, buffered=False):
    delay = 24
    popped_frames = 0
    for i, frame in enumerate(stream):
        buff.append([chr(frame), ''])
        time.sleep(1/24 * random.random()) # suppose a 24 fps video
        if i % 6 == 0:                     # suppose one out of 6 frames
            pool.apply_async(process_frame, (i, frame, queue))
        ii, caption = (None, '') if queue.empty() else queue.get()
        if buffered:
            if ii is not None:
                buff[ii - popped_frames][1] = caption
            if i > delay:
                print(buff.popleft())
                popped_frames += 1
        else:
            lag = '' if ii is None else i - ii
            print(chr(frame), caption, lag)

    else:
        pool.close()
        pool.join()
        if buffered:
            try:
                while True:
                    print(buff.popleft())
            except IndexError:
                pass


def process_frame(i, frame, queue):
    time.sleep(0.4 * random.random())      # suppose ~0.2s to process
    caption = chr(frame).upper()           # mocking the result...
    queue.put((i, caption))


if __name__ == '__main__':

    p = Pool()
    q = Manager().Queue()
    d = deque()

    stream = b'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'

    display_stream(stream, p, q, b)

You could try setting the affinity mask to make sure each process runs on a different core. 您可以尝试设置关联掩码以确保每个进程在不同的核心上运行。 I use this on windows 7. 我在Windows 7上使用它。

def setaffinity(mask = 128): # 128 is core 7
    pid    = win32api.GetCurrentProcessId()
    handle = win32api.OpenProcess(win32con.PROCESS_ALL_ACCESS, True, pid)
    win32process.SetProcessAffinityMask(handle, mask)
    return

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM