简体   繁体   English

GPU加速的LK金字塔中的窗口大小限制

[英]Window size limit in GPU accelerated LK pyramid

I am performing image stabilization on a real-time feed in order to run some vision algorithms on the stabilized images (emphasis on "real-time"). 我在实时Feed上执行图像稳定,以便在稳定后的图像上运行某些视觉算法(强调“实时”)。 Currently this process, which uses the CPU-implemented version of the LK pyramids, is barely fast enough, even when building the pyramid beforehand (the reference image and "previous" features are only ever calculated once), but it needs to be scaled to work on images with about four times the resolution, which makes it too slow in the current implementation. 目前,此过程使用LK金字塔的CPU实现版本,即使预先构建金字塔也几乎不够快(参考图像和“先前”功能仅计算一次),但需要扩展到只能以大约四倍的分辨率处理图像,这在当前的实现中太慢了。 I thought I might attempt to speed things up by incorporating the GPU since OpenCV has implemented the same LK approach for CUDA-capable devices, the cv::gpu::PyrLKOpticalFlow class. 我想我可能会尝试通过合并GPU来加快速度,因为OpenCV为支持CUDA的设备实现了相同的LK方法cv :: gpu :: PyrLKOpticalFlow类。 I'm using the ::sparse call with a set of previous features. 我正在将:: sparse调用与一组先前功能结合使用。

My main issue is that there seems to be a limit on the window size, and mine is too large. 我的主要问题是窗口大小似乎受到限制,而我的窗口太大。 The limit occurs in the pyrlk.cpp file as an assertion: 该限制在pyrlk.cpp文件中作为断言出现:

CV_Assert(patch.x > 0 && patch.x < 6 && patch.y > 0 && patch.y < 6);

Where the patch dimensions are determined right above: 在上面确定补丁尺寸的位置:

void calcPatchSize(cv::Size winSize, dim3& block, dim3& patch)
{
    if (winSize.width > 32 && winSize.width > 2 * winSize.height)
    {
        block.x = deviceSupports(FEATURE_SET_COMPUTE_12) ? 32 : 16;
        block.y = 8;
    }
    else
    {
        block.x = 16;
        block.y = deviceSupports(FEATURE_SET_COMPUTE_12) ? 16 : 8;
    }

    patch.x = (winSize.width  + block.x - 1) / block.x;
    patch.y = (winSize.height + block.y - 1) / block.y;

    block.z = patch.z = 1;
}

My problem is I need a window size of about 80x80 pixels, which is A. why I want to employ GPU acceleration and B. why that seems to not work in OpenCV. 我的问题是我需要约80x80像素的窗口大小,这是A.为什么要使用GPU加速和B.为什么在OpenCV中似乎不起作用。 :) In addition, with the larger resolution images this window size will need to grow. :)另外,对于较大分辨率的图像,此窗口大小将需要增大。

I'm not familiar with actually implementing GPU acceleration so I am wondering if someone can explain why this limitation exists in OpenCV, if it's a real limitation imposed by the hardware or by the OpenCV implementation, and if there are ways to work around it. 我对实际实现GPU加速并不熟悉,所以我想知道是否有人可以解释为什么此限制存在于OpenCV中,它是由硬件还是由OpenCV实现施加的真正限制,以及是否有解决方法。 It seems odd that this would be a hardware limitation, since these are the situations when you'd want to use a GPU. 这可能是硬件限制,这很奇怪,因为在您要使用GPU的情况下。 I can get reasonable speed with smaller search windows but the stabilization is not good enough for the application. 我可以使用较小的搜索窗口来获得合理的速度,但是对于应用程序而言,稳定性不足。

I need such a large search window size because I'm calculating the motion to the first (reference) frame. 我需要很大的搜索窗口大小,因为我正在计算到第一帧(参考帧)的运动。 The motion is cyclical plus some small random drift so this method works well, but requires a bit more space to search at the peaks of the cycle when the matching features might be around 30-40 pixels away (at original resolution). 运动是周期性的,加上一些小的随机漂移,因此该方法效果很好,但是当匹配特征可能相距约30-40像素(以原始分辨率)时,需要更多的空间来搜索周期的峰值。

This is using OpenCV version 2.4.10 on Linux, built from source for CUDA support. 这是在Linux上使用OpenCV版本2.4.10,它是为支持CUDA而从源代码构建的。

(This is a (somewhat modified) re-post from http://answers.opencv.org/question/54579/window-size-limit-in-gpu-accelerated-lk-pyramid/ , but there doesn't seem to be much activity there so hopefully SO provides a better discussion environment!) (这是从http://answers.opencv.org/question/54579/window-size-limit-in-gpu-accelerated-lk-pyramid/中重新发布的内容,但似乎没有那里有很多活动,因此希望可以提供更好的讨论环境!)

The patch size is passed to the CUDA kernel as a template parameter. 补丁大小将作为模板参数传递到CUDA内核。

See calling code at https://github.com/jet47/opencv/blob/master/modules/cudaoptflow/src/cuda/pyrlk.cu#L493 : 参见https://github.com/jet47/opencv/blob/master/modules/cudaoptflow/src/cuda/pyrlk.cu#L493上的调用代码:

static const func_t funcs[5][5] =
{
    {sparse_caller<1, 1, 1>, sparse_caller<1, 2, 1>, sparse_caller<1, 3, 1>, sparse_caller<1, 4, 1>, sparse_caller<1, 5, 1>},
    {sparse_caller<1, 1, 2>, sparse_caller<1, 2, 2>, sparse_caller<1, 3, 2>, sparse_caller<1, 4, 2>, sparse_caller<1, 5, 2>},
    {sparse_caller<1, 1, 3>, sparse_caller<1, 2, 3>, sparse_caller<1, 3, 3>, sparse_caller<1, 4, 3>, sparse_caller<1, 5, 3>},
    {sparse_caller<1, 1, 4>, sparse_caller<1, 2, 4>, sparse_caller<1, 3, 4>, sparse_caller<1, 4, 4>, sparse_caller<1, 5, 4>},
    {sparse_caller<1, 1, 5>, sparse_caller<1, 2, 5>, sparse_caller<1, 3, 5>, sparse_caller<1, 4, 5>, sparse_caller<1, 5, 5>}
};

where sparse_caller is declared as: sparse_caller声明为:

template <int cn, int PATCH_X, int PATCH_Y>
void sparse_caller(int rows, int cols, const float2* prevPts, float2* nextPts, 
                   uchar* status, float* err, int ptcount,
                   int level, dim3 block, cudaStream_t stream)

The limitation for the patch size was done to reduce the number of template instantiations. 补丁大小的限制是为了减少模板实例化的数量。 You can increase this limitation for your need by modifying this code and adding more instantiations. 您可以通过修改此代码并添加更多实例化来增加此限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM