How to achieve a faster convolve2d using GPU

Question

I was recently learning PyCuda and planning to replace some code of a camera system to speed up image processing. That part was originally using cv2.filter2D . My intention is to accelerate the processing with GPU.

Time for signal.convolve2d: 1.6639747619628906
Time for cusignal.convolve2d: 0.6955723762512207
Time for cv2.filter2D: 0.18787837028503418

However, it seems that cv2.filter2D is still the fastest among three. If the input is a long list of images, could a custom PyCuda kernel outweigh the cv2.filter2D?

import time
import cv2
from cusignal.test.utils import array_equal
import cusignal
import cupy as cp
import numpy as np
from scipy import signal
from scipy import misc
ascent = misc.ascent()
ascent = np.array(ascent, dtype='int16')

ascentList = [ascent]*100

filterSize = 3
scharr = np.ones((filterSize, filterSize), dtype="float") * (1.0 / (filterSize*filterSize))

startTime = time.time()
for asc in ascentList:
    grad = signal.convolve2d(asc, scharr, boundary='symm', mode='same')
endTime = time.time()
print("Time for signal.convolve2d: "+str(endTime - startTime))

startTime = time.time()
for asc in ascentList:
    gpu_convolve2d = cp.asnumpy(cusignal.convolve2d(cp.asarray(asc), scharr, boundary='symm', mode='same'))
endTime = time.time()
print("Time for cusignal.convolve2d: "+str(endTime - startTime))
print("If signal equal to cusignal: "+ str(array_equal(grad, gpu_convolve2d)))

startTime = time.time()
for asc in ascentList:
    opencvOutput = cv2.filter2D(asc, -1, scharr)
endTime = time.time()
print("Time for cv2.filter2D: "+str(endTime - startTime))
print("If cv2 equal to cusignal: "+ str(array_equal(opencvOutput, gpu_convolve2d)))

Answer 1

In your timing analysis of the GPU, you are timing the time to copy asc to the GPU, execute convolve2d , and transfer the answer back. Transfers to and from the GPU are very slow in the scheme of things. If you want a true comparison of the compute just profile convolve2d .
Currently the cuSignal.convolve2d is written in Numba. We are in the process of porting this to use CuPy Raw Kernels, and there will be an improvement. I don't have an ETA on convolve2d .
It looks there might be a OpenCV CUDA version https://github.com/opencv/opencv_contrib/blob/master/modules/cudafilters/src/cuda/filter2d.cu
Have you tried scipy.ndimage.filters.convolve - http://blog.rtwilson.com/convolution-in-python-which-function-to-use/
Also, checkout CuPy's convolve - https://github.com/cupy/cupy/blob/master/cupyx/scipy/ndimage/filters.py

Now to your original question. When trying to determine if the GPU will be faster than the CPU, you need to ensure there is enough work to keep the GPU busy. It is known that in some cases, where the data size is small, the CPU will perform faster.

How to achieve a faster convolve2d using GPU

Question

1 answers

solution1
1 2020-03-05 23:43:10

How to achieve a faster convolve2d using GPU

Question

1 answers

solution1 1 2020-03-05 23:43:10

solution1
1 2020-03-05 23:43:10