[英]Leveraging GPU with OpenCV 4.2
I am trying to use the GPU of my virtual machine with OpenCV library (4.2) and Python 3.7.我正在尝试将我的虚拟机的 GPU 与 OpenCV 库(4.2)和 Python 3.7 一起使用。
I have installed opencv with CUDA and the following command returns 1:我已经安装了 opencv 和 CUDA 并且以下命令返回 1:
import cv2
count = cv2.cuda.getCudaEnabledDeviceCount()
print(count)
I tried to run my code with and without leveraging GPU:我尝试在使用和不使用 GPU 的情况下运行我的代码:
start = time.time()
network = cv.dnn.readNetFromDarknet(config_path, weights_path)
if IS_CUDA:
network.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
network.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)
my_function(network, frame)
end = time.time()
print(end - start)
But the running time is the same.但是运行时间是一样的。
My question: Is using setPreferableBackend() and setPreferableTarget() enough to leverage GPU power?我的问题:使用 setPreferableBackend() 和 setPreferableTarget() 是否足以利用 GPU 电源?
It should be enough to make it use CUDA and you should see a noticeable speedup.让它使用 CUDA 就足够了,您应该会看到明显的加速。 Check your Task Manager's Performance tab and see if your Nvidia GPU is being used.
检查您的任务管理器的性能选项卡,看看您的 Nvidia GPU 是否正在使用。 I don't know what's happening behind the scenes in my_function(), so I can't tell if there's something different in how you're using it.
我不知道 my_function() 的幕后发生了什么,所以我不知道你使用它的方式是否有什么不同。
This is the class that I usually copy around when I want a quick setup for the Yolo network这是 class,当我想快速设置 Yolo 网络时,我通常会复制它
import cv2
import numpy as np
class Yolo:
def __init__(self, cfg, weights, names, conf_thresh, nms_thresh, use_cuda = False):
# save thresholds
self.ct = conf_thresh;
self.nmst = nms_thresh;
# create net
self.net = cv2.dnn.readNet(weights, cfg);
print("Finished: " + str(weights));
self.classes = [];
file = open(names, 'r');
for line in file:
self.classes.append(line.strip());
# use gpu + CUDA to speed up detections
if use_cuda:
self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA);
self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA);
# get output names
layer_names = self.net.getLayerNames();
self.output_layers = [layer_names[i[0]-1] for i in self.net.getUnconnectedOutLayers()];
# runs detection on the image and draws on it
def detect(self, img, target_id = None):
# get detection stuff
b, c, ids, idxs = self.get_detection_data(img, target_id);
# draw result
img = self.draw(img, b, c, ids, idxs);
return img, len(idxs);
# returns boxes, confidences, class_ids, and indexes (indices?)
def get_detection_data(self, img, target_id = None):
# get output
layer_outputs = self.get_inf(img);
# get dims
height, width = img.shape[:2];
# filter thresholds and target
b, c, ids, idxs = self.thresh(layer_outputs, width, height, target_id);
return b, c, ids, idxs;
# runs the network on an image
def get_inf(self, img):
# construct a blob
blob = cv2.dnn.blobFromImage(img, 1 / 255.0, (416,416), swapRB=True, crop=False);
# get response
self.net.setInput(blob);
layer_outputs = self.net.forward(self.output_layers);
return layer_outputs;
# filters the layer output by conf, nms and id
def thresh(self, layer_outputs, width, height, target_id = None):
# some lists
boxes = [];
confidences = [];
class_ids = [];
# each layer outputs
for output in layer_outputs:
for detection in output:
# get id and confidence
scores = detection[5:];
class_id = np.argmax(scores);
confidence = scores[class_id];
# filter out low confidence
if confidence > self.ct:
# filter by target_id if set
if target_id is None or class_id == target_id:
# scale bounding box back to the image size
box = detection[0:4] * np.array([width, height, width, height]);
(cx, cy, w, h) = box.astype('int');
# grab the top-left corner of the box
tx = int(cx - (w / 2));
ty = int(cy - (h / 2));
# update lists
boxes.append([tx,ty,int(w),int(h)]);
confidences.append(float(confidence));
class_ids.append(class_id);
# apply NMS
idxs = cv2.dnn.NMSBoxes(boxes, confidences, self.ct, self.nmst);
return boxes, confidences, class_ids, idxs;
# draw detections on image
def draw(self, img, boxes, confidences, class_ids, idxs):
# check for zero
if len(idxs) > 0:
# loop over indices
for i in idxs.flatten():
# extract the bounding box coords
(x,y) = (boxes[i][0], boxes[i][1]);
(w,h) = (boxes[i][2], boxes[i][3]);
# draw a box
cv2.rectangle(img, (x,y), (x+w,y+h), (0,0,255), 2);
# draw text
text = "{}: {:.4}".format(self.classes[class_ids[i]], confidences[i]);
cv2.putText(img, text, (x, y-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255), 2);
return img;
You can use it like this你可以像这样使用它
import cv2
import numpy as np
# this is the "yolo.py" file, I assume it's in the same folder as this program
from yolo import Yolo
# these are the filepaths of the yolo files
weights = "yolov3-tiny.weights";
config = "yolov3-tiny.cfg";
labels = "yolov3.txt";
# init yolo network
target_class_id = 79; # toothbrush
conf_thresh = 0.4; # less == more boxes (but more false positives)
nms_thresh = 0.4; # less == more boxes (but more overlap)
net = Yolo(config, weights, labels, conf_thresh, nms_thresh, use_cuda = True);
# open video capture
cap = cv2.VideoCapture(0); # probably laptop webcam
# loop
done = False;
while not done:
# get frame
ret, frame = cap.read();
if not ret:
done = cv2.waitKey(1) == ord('q');
continue;
# draw detection
# frame, _ = net.detect(frame, target_id=target_class_id); # use this to filter by a single class_id
frame, _ = net.detect(frame); # use this to not filter by class_id
# show
cv2.imshow("Marked", frame);
done = cv2.waitKey(1) == ord('q');
Edit: Oh, ok, lol.编辑:哦,好的,大声笑。 I found my install notes and the very first instruction is this:
我找到了我的安装说明,第一条指令是这样的:
The url is: url 是:
https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/ https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/
Not sure how helpful this will be, but I apparently thought the tutorial was clear enough that I didn't need any additional notes or corrections for this step.不确定这会有多大帮助,但我显然认为教程足够清晰,我不需要任何额外的注释或更正这一步。
First of all, before runnning into code, you must check your working environement首先,在运行代码之前,您必须检查您的工作环境
What virtual machine are you using?你用的是什么虚拟机? usually virtual machines does not support GPU virtualisation.
通常虚拟机不支持 GPU 虚拟化。 You must add PCI-Passthrough to get this feature.
您必须添加 PCI-Passthrough 才能获得此功能。
If true, can you first check that your device is successfully detected by running nvidia-smi
(within your favorite bash/cmd environment) in order to check your nvidia drivers.如果为真,您能否首先通过运行
nvidia-smi
(在您最喜欢的 bash/cmd 环境中)检查您的设备是否已成功检测到,以检查您的 nvidia 驱动程序。
Then check that your opencv-python supports your gpu by running in a python shell cv2.cuda.getCudaEnabledDeviceCount()
Then check that your opencv-python supports your gpu by running in a python shell
cv2.cuda.getCudaEnabledDeviceCount()
if the result is 0, and nvidia-smi detects your gpu, you must compile your python opencv wheel with the support of your specific gpu model and install it to your python environment. if the result is 0, and nvidia-smi detects your gpu, you must compile your python opencv wheel with the support of your specific gpu model and install it to your python environment.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.