为什么GPU没有为opencv SURF算法显示优于CPU的优势？

Question

I want to using the GPU to accelerate SURF algorithm. 我想使用GPU来加速SURF算法。 But Actually I found the CPUs(enale TBB) are more faster than the GPU for SURF algo. 但实际上我发现CPU（enale TBB）比SURF算法的GPU更快。 My hardware and OS Info: CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz (4 cores + 8 thread) GPU: Nvidia GTX 660ti ~1000MHz (1344 GPU cores) ubuntu 12.04 (64bit) 我的硬件和操作系统信息： CPU：Intel（R）Xeon（R）CPU E3-1230 V2 @ 3.30GHz（4核+ 8线程）GPU：Nvidia GTX 660ti~1000MHz（1344 GPU核心）ubuntu 12.04（64bit）

Apply scene : My folders have about 120 images. 应用场景：我的文件夹有大约120张图像。 i need to get keypoints for every image using SURF. 我需要使用SURF为每个图像获取关键点。

Time Logs 时间日志

CPU(TBB) for every image ,spend time logs: 每个图像的CPU（TBB），花时间日志：

indexing DB:/home/ole/MatchServer/ImgDB0/img0 cost time on SURF ALGO (ON TBB)[s]: 0.00666648 索引DB：/ home / ole / MatchServer / ImgDB0 / img0 SURF ALGO（ON TBB）的成本时间[s]：0.00666648

indexing DB:/home/ole/MatchServer/ImgDB0/img1 cost time onSURF ALGO (ON TBB)[s]: 0.00803925 索引DB：/ home / ole / MatchServer / ImgDB0 / img1成本时间onSURF ALGO（ON TBB）[s]：0.00803925

indexing DB:/home/ole/MatchServer/ImgDB0/img2 cost time on SURF ALGO (ON TBB)[s]: 0.0066344 索引DB：/ home / ole / MatchServer / ImgDB0 / img2 SURF ALGO（ON TBB）的成本时间[s]：0.0066344

indexing DB:/home/ole/MatchServer/ImgDB0/img3 cost time on SURF ALGO (ON TBB)[s]: 0.00625698 索引DB：/ home / ole / MatchServer / ImgDB0 / img3 SURF ALGO（ON TBB）的成本时间[s]：0.00625698

indexing DB:/home/ole/MatchServer/ImgDB0/img4 cost time on SURF ALGO (ON TBB)[s]: 0.00699448 索引DB：/ home / ole / MatchServer / ImgDB0 / img4 SURF ALGO（ON TBB）的成本时间[s]：0.00699448

indexing DB:/home/ole/MatchServer/ImgDB0/img5 cost time on SURF ALGO (ON TBB)[s]: 0.00621663 索引DB：/ home / ole / MatchServer / ImgDB0 / img5 SURF ALGO（ON TBB）的成本时间[s]：0.00621663

        .................more..................................

GPU for every image , spend time logs( GPU for every image have 2 lines log, one is upload img to GPU Mem, Second is SURF_GPU algo spend time): GPU为每个图像，花费时间日志（每个图像的GPU有2行日志，一个是上传img到GPU内存，第二个是SURF_GPU算法花费时间）：

indexing DB:/home/ole/MatchServer/ImgDB0/img0 cost time on GPU upload image[s]: 1.99329 索引DB：/ home / ole / MatchServer / ImgDB0 / img0 GPU上传图像的成本时间[s]：1.99329

indexing DB:/home/ole/MatchServer/ImgDB0/img0 cost time on Gpu SURF ALGO[s]: 0.00971809 索引DB：/ home / ole / MatchServer / ImgDB0 / img0 Gpu上的成本时间SURF ALGO [s]：0.00971809

indexing DB:/home/ole/MatchServer/ImgDB0/img1 cost time on GPU upload image[s]: 0.000157638 索引DB：/ home / ole / MatchServer / ImgDB0 / img1 GPU上传图像的成本时间[s]：0.000157638

indexing DB:/home/ole/MatchServer/ImgDB0/img1 cost time on Gpu SURF ALGO[s]: 0.00618778 索引DB：/ home / ole / MatchServer / ImgDB0 / img1 Gpu上的成本时间SURF ALGO [s]：0.00618778

indexing DB:/home/ole/MatchServer/ImgDB0/img2 cost time on GPU upload image[s]: 8.8108e-05 索引DB：/ home / ole / MatchServer / ImgDB0 / img2 GPU上传图像的成本时间[s]：8.8108e-05

indexing DB:/home/ole/MatchServer/ImgDB0/img2 cost time on Gpu SURF ALGO[s]: 0.00736609 索引DB：/ home / ole / MatchServer / ImgDB0 / img2 Gpu上的成本时间SURF ALGO [s]：0.00736609

indexing DB:/home/ole/MatchServer/ImgDB0/img3 cost time on GPU upload image[s]: 8.8599e-05 索引DB：/ home / ole / MatchServer / ImgDB0 / img3 GPU上传图像的成本时间[s]：8.8599e-05

indexing DB:/home/ole/MatchServer/ImgDB0/img3 cost time on Gpu SURF ALGO[s]: 0.00559131 索引DB：/ home / ole / MatchServer / ImgDB0 / img3 Gpu上的成本时间SURF ALGO [s]：0.00559131

indexing DB:/home/ole/MatchServer/ImgDB0/img4 cost time on GPU upload image[s]: 8.7626e-05 索引DB：/ home / ole / MatchServer / ImgDB0 / img4 GPU上传图像的成本时间[s]：8.7626e-05

indexing DB:/home/ole/MatchServer/ImgDB0/img4 cost time on Gpu SURF ALGO[s]: 0.00610033 索引DB：/ home / ole / MatchServer / ImgDB0 / img4 Gpu上的成本时间SURF ALGO [s]：0.00610033

indexing DB:/home/ole/MatchServer/ImgDB0/img5 cost time on GPU upload image[s]: 8.9125e-05 索引DB：/ home / ole / MatchServer / ImgDB0 / img5 GPU上传图像的成本时间[s]：8.9125e-05

indexing DB:/home/ole/MatchServer/ImgDB0/img5 cost time on Gpu SURF ALGO[s]: 0.00632997 索引DB：/ home / ole / MatchServer / ImgDB0 / img5 Gpu上的成本时间SURF ALGO [s]：0.00632997

      ............................more..................................

I found the first image is very slow about 2 sec that uploading the image mat to GPU . 我发现第一张图像非常慢，大约2秒，将图像垫上传到GPU。 the next is normal about 0.000157638 sec. 接下来是正常的约0.000157638秒。

GPU CODE : GPU代码 ：

    try
    {
        double t0 = (double)getTickCount();
        cv::gpu::SURF_GPU surf_gpu;
        Size size = help_img.size();
        Size size0 = size;
        int type = help_img.type();
        cv::gpu::GpuMat d_m(size0, type);
        if(size0 != help_img.size() )
            d_m = d_m(Rect((size0.width - size.width) / 2, (size0.height - size.height) / 2, size.width, size.height));
        d_m.upload(help_img);
        double t = ((double)getTickCount() - t0)/getTickFrequency();
        std::cout << "indexing DB:"<< path << " cost time on upload image[s]: " << t << std::endl;

        t0 = (double)getTickCount();
        surf_gpu(d_m, cv::gpu::GpuMat(), help_keypoints);
        t = ((double)getTickCount() - t0)/getTickFrequency();
        std::cout << "indexing DB:"<< path << " cost time on Gpu image[s]: " << t << std::endl;
    }
    catch (const cv::Exception& e)
    {
       printf("issue happen!");
    }

Please help to give some suggestions about the following question: 请帮助提出以下问题的一些建议：

1. Why the first upload the image to GPU is very slower about 2 second ? 1.为什么第一次将图像上传到GPU的速度大约是2秒？

2. Why the GPU not accelerate the SURF algorithm, SURF have much calculate,in Theory,GPU can accelerate it. 2.为什么GPU没有加速SURF算法，SURF有很多计算，理论上，GPU可以加速它。

3. How to do can improve the GPU performance for the SURF algorithm? 3.如何提高SURF算法的GPU性能？

Thanks!! 谢谢！！

Answer 1

The first upload to GPU will always be slower. 首次上传到GPU总是会慢一些。 The GPU needs to be initialized before it can be do some actual work. GPU需要先进行初始化才能进行实际工作。 This is because a default CUDA context is created on the first CUDA call, which in your case, is the upload to GPU Mat. 这是因为在第一个CUDA调用上创建了一个默认的CUDA上下文，在您的情况下，是上传到GPU Mat。 A workaround is to call a random GPU function before doing the actual work. 解决方法是在执行实际工作之前调用随机GPU函数。
It depends on the GPU and CPU you are comparing. 这取决于您要比较的GPU和CPU。 A high end CPU like the XEON you are using is more likely to win when using TBB. 像您使用的XEON这样的高端CPU在使用TBB时更有可能获胜。 For actual speedup, try using a high end GPU like NVIDIA Tesla. 对于实际加速，尝试使用像NVIDIA Tesla这样的高端GPU。 Current implementation of OpenCV probably is not optimized for Kepler architecture GPU you are using. OpenCV的当前实现可能未针对您正在使用的Kepler架构GPU进行优化。
There is not a fixed answer for that. 对此没有固定的答案。 It depends on the parallel nature of algorithm, optimal implementation, and the hardware present in the system. 它取决于算法的并行性质，最佳实现以及系统中存在的硬件。

为什么GPU没有为opencv SURF算法显示优于CPU的优势？

问题描述

1 个解决方案

解决方案1
3 已采纳 2012-09-24 12:05:11

为什么GPU没有为opencv SURF算法显示优于CPU的优势？

问题描述

1 个解决方案

解决方案1 3 已采纳 2012-09-24 12:05:11

解决方案1
3 已采纳 2012-09-24 12:05:11