简体   繁体   English

为什么 OpenCV 可以等待流式 CUDA 操作而不是异步进行?

[英]Why could OpenCV wait for a stream-ed CUDA operation instead of proceeding asynchronously?

I'm trying to perform some image dilation using OpenCV & CUDA.我正在尝试使用 OpenCV 和 CUDA 执行一些图像膨胀。 I invoke two calls to filter->apply(...) with a different filter object and on a different Mat , after each other, every time specifying a different stream to work with.我使用不同的filter object 和不同的Mat调用两个对filter->apply(...)的调用,彼此接连,每次指定不同的 stream 使用。 They DO get executed in different streams, as can be seen from the attached nvvp profiling info, but they run sequentially, instead of in parallel.从附加的 nvvp 分析信息中可以看出,它们确实在不同的流中执行,但它们是按顺序运行的,而不是并行运行的。 This seems to be caused, for some reason, by the CPU waiting for the stream ( cudaStreamSynchronize ).由于某种原因,这似乎是由 CPU 等待 stream ( cudaStreamSynchronize ) 引起的。 nvvp截图 Why could OpenCV do that?为什么 OpenCV 可以这样做? I'm not calling the wait for the stream explicitly or anything, what else could be wrong?我没有明确地调用等待 stream 或其他任何东西,还有什么问题?

Here's the actual code:这是实际的代码:

    cv::Mat hIm1, hIm2;
    cv::imread("/path/im1.png", cv::IMREAD_GRAYSCALE).convertTo(hIm1, CV_32FC1);
    cv::imread("/path/im2.png", cv::IMREAD_GRAYSCALE).convertTo(hIm2, CV_32FC1);
    cv::cuda::GpuMat dIm1(hIm1);
    cv::cuda::GpuMat dIm2(hIm2);

    cv::cuda::Stream stream1, stream2;

    const cv::Mat strel1 = cv::getStructuringElement(cv::MORPH_ELLIPSE, cv::Size(41, 41));
    cv::Ptr<cv::cuda::Filter> filter1 = cv::cuda::createMorphologyFilter(cv::MORPH_DILATE, dIm1.type(), strel1);
    const cv::Mat strel2 = cv::getStructuringElement(cv::MORPH_ELLIPSE, cv::Size(41, 41));
    cv::Ptr<cv::cuda::Filter> filter2 = cv::cuda::createMorphologyFilter(cv::MORPH_DILATE, dIm2.type(), strel2);
    cudaDeviceSynchronize();
    filter1->apply(dIm1, dIm1, stream1);
    filter2->apply(dIm2, dIm2, stream2);
    cudaDeviceSynchronize();

The images are sized 512×512;图像尺寸为 512×512; I tried it with smaller ones (down to 64×64) but to no avail!我尝试了较小的(低至 64×64),但无济于事!

It was user responsibility to run the application sequentially.按顺序运行应用程序是用户的责任。

Few Best Practices:几个最佳实践:

  1. Pipeline your code so that both CPU and GPU utilized at the same time.流水线化您的代码,以便同时使用 CPU 和 GPU。 Make GPU call as asynchronous.使 GPU 调用异步。
  2. GPU Need resources to run sequentially. GPU 需要资源顺序运行。 If the filter1() utilized 100% of GPU, then the filter2() will wait in the pipeline until filter1() completes.如果 filter1() 使用了 100% 的 GPU,则 filter2() 将在管道中等待直到 filter1() 完成。

Please check with the GPU utilization data in profiler for more details.请查看分析器中的 GPU 利用率数据以获取更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM