[英]GPU with OpenCL is slower than CPU. Why?
Environment:环境:
I'm trying to use OpenCL to speed up my code.我正在尝试使用 OpenCL 来加速我的代码。 But the result shows CPU is faster than GPU.
但结果显示 CPU 比 GPU 快。 How could I speed up my code?
我怎样才能加快我的代码?
void GetHoughLines(cv::Mat dst) {
cv::ocl::setUseOpenCL(true);
int img_w = dst.size().width; // 5000
int img_h = dst.size().height; // 4000
cv::UMat tmp_dst = dst.getUMat(cv::ACCESS_READ);
cv::UMat tmp_mat = cv::UMat(dst.size(), CV_8UC1, cv::Scalar(0));
for (size_t i = 0; i < 1000; i++)
{
tmp_mat = tmp_mat.mul(tmp_dst);
}
}
It took about 3000ms when I used only CPU.当我只使用 CPU 时,大约需要 3000 毫秒。 When I used Intel UHD Graphics 630, it took 3500ms.
当我使用 Intel UHD Graphics 630 时,它花了 3500 毫秒。 And I also tried GTX1050, but it took about 3000ms.
而且我也试过GTX1050,但是用了大约3000ms。
Please give me some ideas to speed it up.请给我一些想法以加快速度。 I should make it at least 1000ms.
我应该让它至少 1000 毫秒。 Should I use AMP or OpenMP?
我应该使用 AMP 还是 OpenMP? But as I know, they can only compute simple operations, not suitable for OpenCV functions.
但据我所知,它们只能计算简单的操作,不适用于 OpenCV 函数。
Basically, Your code is slow because the way OpenCV uses OpenCL is inefficient.基本上,您的代码很慢,因为 OpenCV 使用 OpenCL 的方式效率低下。 It has nothing to do with the underlying hardware.
它与底层硬件无关。
In order for OpenCL code (or any GPU related code for that matter) to be efficient, it is crucial for the host side code to properly utilize the GPU.为了使 OpenCL 代码(或任何与此相关的 GPU 相关代码)高效,主机端代码正确利用 GPU 至关重要。 To name a few principles:
举几个原则:
Even if you write the most optimized GPU kernels, but fail to adhere to these basics, you are very unlikely to gain any performance boosts.即使您编写了最优化的 GPU 内核,但未能遵守这些基础知识,您也不太可能获得任何性能提升。
The OpenCV codebase is a great example of how not to adhere to these principles. OpenCV 代码库是如何不遵守这些原则的一个很好的例子。
As for your example, if you rewrite your code to avoid memory copies and use device memory explicitly, you might witness a reasonable performance:对于您的示例,如果您重写代码以避免内存复制并显式使用设备内存,您可能会看到合理的性能:
auto frame1 = cv::UMat(size, format, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
auto frame2 = cv::UMat(size, format, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
auto frame3 = cv::UMat(size, format, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
for (size_t i = 0; i < 10; i++)
{
cv::multiply(frame1, frame2, frame3);
}
But in any case, I recommend you learn using the OpenCL API without OpenCV.但无论如何,我建议您在不使用 OpenCV 的情况下学习使用 OpenCL API。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.