利用 GPU 和 OpenCV 4.2

Question

I am trying to use the GPU of my virtual machine with OpenCV library (4.2) and Python 3.7.我正在尝试将我的虚拟机的 GPU 与 OpenCV 库（4.2）和 Python 3.7 一起使用。

I have installed opencv with CUDA and the following command returns 1:我已经安装了 opencv 和 CUDA 并且以下命令返回 1：

import cv2
count = cv2.cuda.getCudaEnabledDeviceCount()
print(count)

I tried to run my code with and without leveraging GPU:我尝试在使用和不使用 GPU 的情况下运行我的代码：

start = time.time()
network = cv.dnn.readNetFromDarknet(config_path, weights_path)
if IS_CUDA:
    network.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
    network.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)
my_function(network, frame)
end = time.time()
print(end - start)

But the running time is the same.但是运行时间是一样的。

My question: Is using setPreferableBackend() and setPreferableTarget() enough to leverage GPU power?我的问题：使用 setPreferableBackend() 和 setPreferableTarget() 是否足以利用 GPU 电源？

Answer 1

It should be enough to make it use CUDA and you should see a noticeable speedup.让它使用 CUDA 就足够了，您应该会看到明显的加速。 Check your Task Manager's Performance tab and see if your Nvidia GPU is being used.检查您的任务管理器的性能选项卡，看看您的 Nvidia GPU 是否正在使用。 I don't know what's happening behind the scenes in my_function(), so I can't tell if there's something different in how you're using it.我不知道 my_function() 的幕后发生了什么，所以我不知道你使用它的方式是否有什么不同。

This is the class that I usually copy around when I want a quick setup for the Yolo network这是 class，当我想快速设置 Yolo 网络时，我通常会复制它

import cv2
import numpy as np

class Yolo:
    def __init__(self, cfg, weights, names, conf_thresh, nms_thresh, use_cuda = False):
        # save thresholds
        self.ct = conf_thresh;
        self.nmst = nms_thresh;

        # create net
        self.net = cv2.dnn.readNet(weights, cfg);
        print("Finished: " + str(weights));
        self.classes = [];
        file = open(names, 'r');
        for line in file:
            self.classes.append(line.strip());

        # use gpu + CUDA to speed up detections
        if use_cuda:
            self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA);
            self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA);

        # get output names
        layer_names = self.net.getLayerNames();
        self.output_layers = [layer_names[i[0]-1] for i in self.net.getUnconnectedOutLayers()];

    # runs detection on the image and draws on it
    def detect(self, img, target_id = None):
        # get detection stuff
        b, c, ids, idxs = self.get_detection_data(img, target_id);

        # draw result
        img = self.draw(img, b, c, ids, idxs);
        return img, len(idxs);

    # returns boxes, confidences, class_ids, and indexes (indices?)
    def get_detection_data(self, img, target_id = None):
        # get output
        layer_outputs = self.get_inf(img);

        # get dims
        height, width = img.shape[:2];

        # filter thresholds and target
        b, c, ids, idxs = self.thresh(layer_outputs, width, height, target_id);
        return b, c, ids, idxs;

    # runs the network on an image
    def get_inf(self, img):
        # construct a blob
        blob = cv2.dnn.blobFromImage(img, 1 / 255.0, (416,416), swapRB=True, crop=False);

        # get response
        self.net.setInput(blob);
        layer_outputs = self.net.forward(self.output_layers);
        return layer_outputs;

    # filters the layer output by conf, nms and id
    def thresh(self, layer_outputs, width, height, target_id = None):
        # some lists
        boxes = [];
        confidences = [];
        class_ids = [];

        # each layer outputs
        for output in layer_outputs:
            for detection in output:
                # get id and confidence
                scores = detection[5:];
                class_id = np.argmax(scores);
                confidence = scores[class_id];

                # filter out low confidence
                if confidence > self.ct:
                    # filter by target_id if set
                    if target_id is None or class_id == target_id:
                        # scale bounding box back to the image size
                        box = detection[0:4] * np.array([width, height, width, height]);
                        (cx, cy, w, h) = box.astype('int');

                        # grab the top-left corner of the box
                        tx = int(cx - (w / 2));
                        ty = int(cy - (h / 2));

                        # update lists
                        boxes.append([tx,ty,int(w),int(h)]);
                        confidences.append(float(confidence));
                        class_ids.append(class_id);

        # apply NMS
        idxs = cv2.dnn.NMSBoxes(boxes, confidences, self.ct, self.nmst);
        return boxes, confidences, class_ids, idxs;

    # draw detections on image
    def draw(self, img, boxes, confidences, class_ids, idxs):
        # check for zero
        if len(idxs) > 0:
            # loop over indices
            for i in idxs.flatten():
                # extract the bounding box coords
                (x,y) = (boxes[i][0], boxes[i][1]);
                (w,h) = (boxes[i][2], boxes[i][3]);

                # draw a box
                cv2.rectangle(img, (x,y), (x+w,y+h), (0,0,255), 2);

                # draw text
                text = "{}: {:.4}".format(self.classes[class_ids[i]], confidences[i]);
                cv2.putText(img, text, (x, y-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255), 2);
        return img;

You can use it like this你可以像这样使用它

import cv2
import numpy as np

# this is the "yolo.py" file, I assume it's in the same folder as this program
from yolo import Yolo

# these are the filepaths of the yolo files
weights = "yolov3-tiny.weights";
config = "yolov3-tiny.cfg";
labels = "yolov3.txt";

# init yolo network
target_class_id = 79; # toothbrush
conf_thresh = 0.4; # less == more boxes (but more false positives)
nms_thresh = 0.4; # less == more boxes (but more overlap)
net = Yolo(config, weights, labels, conf_thresh, nms_thresh, use_cuda = True);

# open video capture
cap = cv2.VideoCapture(0); # probably laptop webcam

# loop
done = False;
while not done:
    # get frame
    ret, frame = cap.read();
    if not ret:
        done = cv2.waitKey(1) == ord('q');
        continue;

    # draw detection
    # frame, _ = net.detect(frame, target_id=target_class_id); # use this to filter by a single class_id
    frame, _ = net.detect(frame); # use this to not filter by class_id

    # show
    cv2.imshow("Marked", frame);
    done = cv2.waitKey(1) == ord('q');

Edit: Oh, ok, lol.编辑：哦，好的，大声笑。 I found my install notes and the very first instruction is this:我找到了我的安装说明，第一条指令是这样的：

Follow PyImageSearch's Blog to install opencv with cuda support按照 PyImageSearch 的博客安装 opencv 并支持 cuda

The url is: url 是：

https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/ https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/

Not sure how helpful this will be, but I apparently thought the tutorial was clear enough that I didn't need any additional notes or corrections for this step.不确定这会有多大帮助，但我显然认为教程足够清晰，我不需要任何额外的注释或更正这一步。

Answer 2

First of all, before runnning into code, you must check your working environement首先，在运行代码之前，您必须检查您的工作环境

What virtual machine are you using?你用的是什么虚拟机？ usually virtual machines does not support GPU virtualisation.通常虚拟机不支持 GPU 虚拟化。 You must add PCI-Passthrough to get this feature.您必须添加 PCI-Passthrough 才能获得此功能。

If true, can you first check that your device is successfully detected by running nvidia-smi (within your favorite bash/cmd environment) in order to check your nvidia drivers.如果为真，您能否首先通过运行nvidia-smi （在您最喜欢的 bash/cmd 环境中）检查您的设备是否已成功检测到，以检查您的 nvidia 驱动程序。

Then check that your opencv-python supports your gpu by running in a python shell cv2.cuda.getCudaEnabledDeviceCount() Then check that your opencv-python supports your gpu by running in a python shell cv2.cuda.getCudaEnabledDeviceCount()

if the result is 0, and nvidia-smi detects your gpu, you must compile your python opencv wheel with the support of your specific gpu model and install it to your python environment. if the result is 0, and nvidia-smi detects your gpu, you must compile your python opencv wheel with the support of your specific gpu model and install it to your python environment.

利用 GPU 和 OpenCV 4.2

问题描述

2 个解决方案

解决方案1
1 2021-03-19 04:05:22

解决方案2
0 2021-03-19 07:50:14

利用 GPU 和 OpenCV 4.2

问题描述

2 个解决方案

解决方案1 1 2021-03-19 04:05:22

解决方案2 0 2021-03-19 07:50:14

解决方案1
1 2021-03-19 04:05:22

解决方案2
0 2021-03-19 07:50:14