加快卷积函数C ++

Question

I am trying to implement an "adaptive" convolution for image filtering that limits the maximum or minimum possible values of the output pixel by predetermined bounds. 我正在尝试为图像过滤实现“自适应”卷积，将输出像素的最大或最小可能值限制为预定范围。 I haven't found any functions in opencv that will allow me to do this, so I wrote my own that accomplishes what I am looking for. 我没有在opencv中找到任何允许执行此操作的函数，因此我编写了自己的函数来实现我想要的功能。 (Is there perhaps a different library?) The only issue is that this function takes about 0.9 seconds,whereas the time it takes cv::filter2D to filter an image is about 0.005 seconds (both with the same kernel). （也许有一个不同的库吗？）唯一的问题是此功能大约需要0.9秒，而cv :: filter2D过滤图像所需的时间大约是0.005秒（两者都具有相同的内核）。 Does anyone know how I can speed up my method? 有谁知道我如何加快我的方法？

A couple of comments about my kernel: it is a 9x9 custom sharpening filter, and the kernel IS NOT separable. 关于我的内核的几点评论：这是一个9x9的自定义锐化过滤器，内核是不可分离的。 I tried redesigning my filter to be separable, but I cannot achieve the desired results. 我尝试将滤波器重新设计为可分离的，但无法达到预期的效果。 Any thoughts? 有什么想法吗？ Below is the function I use for my code: 以下是我用于代码的函数：

Mat& adaptive_convolution(Mat& img)
{

    fstream in("kernel.txt");
    string line;

    float v[9][9];
    int i = 0, k = 0;

    while (getline(in, line))
    {
        float value;
        int k = 0;
        stringstream ss(line);

        while (ss >> value)
        {
            v[i][k] = value;
            ++k;
        }
        ++i;
    }


    clock_t init, end;
    double minVal;
    double maxVal;
    Point minLoc;
    Point maxLoc;

    int pad_fact = 4;
    int top, left, bottom, right;

    Mat new_image = img;
    top = pad_fact; bottom = pad_fact;
    left = pad_fact; right = pad_fact;

    copyMakeBorder(img, new_image, top, bottom, left, right, BORDER_CONSTANT, 0);

    minMaxLoc(img, &minVal, &maxVal, &minLoc, &maxLoc);
    new_image / 2^8;
    init = clock();
    double temp = 0;

    for (int i = pad_fact; i < img.rows + pad_fact; i++)
    {
        for (int j = pad_fact; j < img.cols + pad_fact; j++)
        {
            for (int ii = -pad_fact; ii <= pad_fact; ii++)
            {
                for (int jj = -pad_fact; jj <= pad_fact; jj++)
                {
                    //temp = double(v[ii + 2*pad_fact][jj + 2*pad_fact]); 
                    temp = temp + double(v[ii + pad_fact][jj + pad_fact] * float(new_image.at<uchar>(i - jj, j - ii)));
                    //temp = double(new_image.at<uchar>(i - jj, j - ii));
                }
            }
            if (temp > maxVal)
            {
                temp = maxVal;
            }
            else
            {
                if (temp < minVal)
                {
                    temp = minVal;
                }
            }
            new_image.at<uchar>(i, j) = temp;
            temp = 0;
        }
    }



    img = new_image;
    end = clock();
    cout << float(end - init)/1000 << endl;
    return img;
}

EDIT: 编辑：

I was able to speed up the convolution in a python script I am using to about 0.2 seconds using Numba. 我使用Numba可以将我正在使用的python脚本中的卷积加速到大约0.2秒。 I still need to see this kind of improvement using c++. 我仍然需要看到使用c ++的这种改进。 Am I being help back by using opencv? 我可以通过使用opencv获得帮助吗？

import numba as nb
import numpy as np

@nb.autojit
def custom_convolve(image,kernel,pad_fact):
    pad_fact = int(pad_fact)
    filt_im = np.zeros(image.shape)
    rows = image.shape[0]
    columns = image.shape[1]
    glob_max = np.max(image)
    glob_min = np.min(image)

    for x in range(pad_fact,columns-pad_fact,1):
        for y in range(pad_fact,rows-pad_fact,1):
            pix_sum = 0
            for k in range(-pad_fact,pad_fact,1):
                for j in range(-pad_fact,pad_fact,1):
                    pix_sum = pix_sum + kernel[k+pad_fact,j+pad_fact]*image[y-j,x-k]

            if pix_sum > glob_max:
                pix_sum = glob_max
            elif pix_sum < glob_min:
                pix_sum = glob_min

            filt_im[y,x] = pix_sum
    return filt_im

Answer 1

Most of the basic OpenCV implementations are using SSE function which allows two process additon etc. parallel by using 128 bit variable. 大多数基本的OpenCV实现都使用SSE功能，该功能允许使用128位变量并行添加两个进程。 Another trick is if the filter kernel is separable and can be composed as follows: 另一个技巧是过滤器内核是否可分离并且可以组成如下：

K = D * D' K = D * D'

where * denotes the convolutation operator and D is a vector eg [1 2 1], and K the final kernel. 其中*表示卷积运算符，D是向量，例如[1 2 1]，而K是最终核。 Than you can substitute the filtering of an image A to the image B: 比您可以将图像A的过滤替换为图像B：

B = A * K; B = A * K;

with 与

B = A * D (*) (A' * D)' B = A * D（*）（A'* D）'

here (*) denotes the pixelwise multiplication, A' denotes the transposed image ie ' the transpose sign. 此处（*）表示像素相乘，A'表示转置图像，即'转置符号。

加快卷积函数C ++

问题描述

1 个解决方案

解决方案1
0 2016-06-19 18:13:26

加快卷积函数C ++

问题描述

1 个解决方案

解决方案1 0 2016-06-19 18:13:26

解决方案1
0 2016-06-19 18:13:26