简体   繁体   English

为什么 OpenCV GPU CUDA 模板匹配比 CPU 慢这么多?

[英]Why is OpenCV GPU CUDA template matching so much slower than CPU?

I have compiled the newest available OpenCV 4.5.4 version for use with the newest CUDA 11.5 with fast math enabled running on a Windows 10 machine with a GeForce RTX 2070 Super graphics card (7.5 arch).我已经编译了最新的可用 OpenCV 4.5.4 版本,用于与最新的 CUDA 11.5 一起使用,在 ZAEA23489CE3AA9B6406EBB28E7.5420Z 700 显卡机器上运行快速数学I'm using Python 3.8.5.我正在使用 Python 3.8.5。

Runtime results:运行时结果:

  • CPU outperforms GPU (matching a 70x70 needle image in a 300x300 source image) CPU 优于 GPU(匹配 300x300 源图像中的 70x70 针图像)
  • biggest GPU bottleneck is the need to upload the files to the GPU before template matching GPU 最大的瓶颈是模板匹配前需要上传文件到 GPU
  • CPU takes around 0.005 seconds while the GPU takes around 0.42 seconds CPU 大约需要 0.005 秒,而 GPU 大约需要 0.42 秒
  • Both methods end up finding a 100% match两种方法最终都找到了 100% 匹配

Images used:使用的图像:

在此处输入图像描述 Source image源图像

在此处输入图像描述 Needle image针图像

Python code using CPU: Python代码使用CPU:

import cv2
import time

start_time = time.time()
src = cv2.imread("cat.png", cv2.IMREAD_GRAYSCALE)
needle = cv2.imread("needle.png", 0)

result = cv2.matchTemplate(src, needle, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
print("CPU --- %s seconds ---" % (time.time() - start_time))

Python code using GPU: Python 代码使用 GPU:

import cv2
import time

start_time = time.time()
src = cv2.imread("cat.png", cv2.IMREAD_GRAYSCALE)
needle = cv2.imread("needle.png", 0)

gsrc = cv2.cuda_GpuMat()
gtmpl = cv2.cuda_GpuMat()
gresult = cv2.cuda_GpuMat()

upload_time = time.time()
gsrc.upload(src)
gtmpl.upload(needle)
print("GPU Upload time --- %s seconds ---" % (time.time() - upload_time))

match_time = time.time()
matcher = cv2.cuda.createTemplateMatching(cv2.CV_8UC1, cv2.TM_CCOEFF_NORMED)
gresult = matcher.match(gsrc, gtmpl)
print("GPU Match time --- %s seconds ---" % (time.time() - match_time))

result_time = time.time()
resultg = gresult.download()
min_valg, max_valg, min_locg, max_locg = cv2.minMaxLoc(resultg)
print("GPU Result time --- %s seconds ---" % (time.time() - result_time))
print("GPU --- %s seconds ---" % (time.time() - start_time))

Even if I wouldn't take the time it takes to upload the files to the GPU into consideration the matching time alone takes more than 10x of the whole process on the CPU.即使我不会花时间将文件上传到 GPU 考虑到匹配时间本身就需要 CPU 上整个过程的 10 倍以上。 My CUDA is installed correctly, I have run other tests where the GPU outperformed the CPU by a lot, but the results for template matching are really disappointing so far.我的 CUDA 安装正确,我已经运行了其他测试,其中 GPU 的性能远远优于 CPU,但到目前为止模板匹配的结果确实令人失望。

Why is the GPU performing so badly?为什么 GPU 表现如此糟糕?

In answer to your question:回答你的问题:

  1. You said that other tasks were better suited to the GPU.您说其他任务更适合 GPU。 I read the Python CUDA documentation.我阅读了 Python CUDA 文档。 It suggests that you are correct.它表明你是对的。 Some tasks are better suited to the CPU and some are better suited to the CPU.有些任务更适合 CPU,有些任务更适合 CPU。 Without getting into registries and stuff I would have to learn to tell you, I can say that what you write makes sense in reference to the documentation.无需进入注册表和我必须学会告诉你的东西,我可以说你写的东西在参考文档方面是有意义的。
  2. I don't see the actual times here.我没有看到这里的实际时间。 Also, it seems that this bottleneck is to be expected: the CPU is on the motherboard, soldered on with a better connection to the memory.此外,这个瓶颈似乎是意料之中的:CPU 在主板上,焊接在与 memory 更好的连接上。 The GPU is a card, attached with an extended plug that has limitations that a motherboard doesn't. GPU 是一张带有扩展插头的卡,它具有主板没有的限制。 Also, it is not really a troublesome bottleneck because it is not congested.此外,它并不是一个真正麻烦的瓶颈,因为它并不拥挤。
  3. What I have read about architecture and the CUDA documentation suggests that your results are not abnormal.我读过的关于架构和 CUDA 文档的内容表明您的结果没有异常。 The CUDA modules might be better used with large datasets. CUDA 模块可能更适合用于大型数据集。 The advantage provided is that the GPU and CPU can work simultaneously, not in competition.提供的优势是 GPU 和 CPU 可以同时工作,而不是竞争。

If I answered your question, please advise concerning employment, M.Sc.如果我回答了您的问题,请提供有关就业的建议,理学硕士。 CIS Student独联体学生

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM