如何使用 GPU 实现更快的 convolve2d

Question

I was recently learning PyCuda and planning to replace some code of a camera system to speed up image processing.我最近在学习 PyCuda 并计划替换一些相机系统的代码来加速图像处理。 That part was originally using cv2.filter2D .该部分最初使用cv2.filter2D 。 My intention is to accelerate the processing with GPU.我的目的是用 GPU 加速处理。

Time for signal.convolve2d: 1.6639747619628906
Time for cusignal.convolve2d: 0.6955723762512207
Time for cv2.filter2D: 0.18787837028503418

However, it seems that cv2.filter2D is still the fastest among three.但是，似乎 cv2.filter2D 仍然是三个中最快的。 If the input is a long list of images, could a custom PyCuda kernel outweigh the cv2.filter2D?如果输入是一长串图像，自定义 PyCuda 内核是否可以超过 cv2.filter2D？

import time
import cv2
from cusignal.test.utils import array_equal
import cusignal
import cupy as cp
import numpy as np
from scipy import signal
from scipy import misc
ascent = misc.ascent()
ascent = np.array(ascent, dtype='int16')

ascentList = [ascent]*100

filterSize = 3
scharr = np.ones((filterSize, filterSize), dtype="float") * (1.0 / (filterSize*filterSize))

startTime = time.time()
for asc in ascentList:
    grad = signal.convolve2d(asc, scharr, boundary='symm', mode='same')
endTime = time.time()
print("Time for signal.convolve2d: "+str(endTime - startTime))

startTime = time.time()
for asc in ascentList:
    gpu_convolve2d = cp.asnumpy(cusignal.convolve2d(cp.asarray(asc), scharr, boundary='symm', mode='same'))
endTime = time.time()
print("Time for cusignal.convolve2d: "+str(endTime - startTime))
print("If signal equal to cusignal: "+ str(array_equal(grad, gpu_convolve2d)))

startTime = time.time()
for asc in ascentList:
    opencvOutput = cv2.filter2D(asc, -1, scharr)
endTime = time.time()
print("Time for cv2.filter2D: "+str(endTime - startTime))
print("If cv2 equal to cusignal: "+ str(array_equal(opencvOutput, gpu_convolve2d)))

Answer 1

In your timing analysis of the GPU, you are timing the time to copy asc to the GPU, execute convolve2d , and transfer the answer back.在 GPU 的时序分析中，您正在计时将asc复制到 GPU、执行convolve2d并将答案传回的时间。 Transfers to and from the GPU are very slow in the scheme of things.从总体上看，与 GPU 之间的传输非常缓慢。 If you want a true comparison of the compute just profile convolve2d .如果您想对计算进行真正的比较，只需配置convolve2d 。
Currently the cuSignal.convolve2d is written in Numba.目前cuSignal.convolve2d是用 Numba 编写的。 We are in the process of porting this to use CuPy Raw Kernels, and there will be an improvement.我们正在将其移植到使用 CuPy 原始内核的过程中，并且会有改进。 I don't have an ETA on convolve2d .我没有关于convolve2d的预计convolve2d 。
It looks there might be a OpenCV CUDA version https://github.com/opencv/opencv_contrib/blob/master/modules/cudafilters/src/cuda/filter2d.cu看起来可能有一个 OpenCV CUDA 版本https://github.com/opencv/opencv_contrib/blob/master/modules/cudafilters/src/cuda/filter2d.cu
Have you tried scipy.ndimage.filters.convolve - http://blog.rtwilson.com/convolution-in-python-which-function-to-use/你试过scipy.ndimage.filters.convolve - http://blog.rtwilson.com/convolution-in-python-which-function-to-use/
Also, checkout CuPy's convolve - https://github.com/cupy/cupy/blob/master/cupyx/scipy/ndimage/filters.py此外，结帐CuPy的convolve - https://github.com/cupy/cupy/blob/master/cupyx/scipy/ndimage/filters.py

Now to your original question.现在回答你原来的问题。 When trying to determine if the GPU will be faster than the CPU, you need to ensure there is enough work to keep the GPU busy.在尝试确定 GPU 是否会比 CPU 快时，您需要确保有足够的工作来保持 GPU 忙碌。 It is known that in some cases, where the data size is small, the CPU will perform faster.众所周知，在某些情况下，数据量较小，CPU 会执行得更快。

如何使用 GPU 实现更快的 convolve2d

问题描述

1 个解决方案

解决方案1
1 2020-03-05 23:43:10

如何使用 GPU 实现更快的 convolve2d

问题描述

1 个解决方案

解决方案1 1 2020-03-05 23:43:10

解决方案1
1 2020-03-05 23:43:10