简体   繁体   中英

Speed up Convolution Function C++

I am trying to implement an "adaptive" convolution for image filtering that limits the maximum or minimum possible values of the output pixel by predetermined bounds. I haven't found any functions in opencv that will allow me to do this, so I wrote my own that accomplishes what I am looking for. (Is there perhaps a different library?) The only issue is that this function takes about 0.9 seconds,whereas the time it takes cv::filter2D to filter an image is about 0.005 seconds (both with the same kernel). Does anyone know how I can speed up my method?

A couple of comments about my kernel: it is a 9x9 custom sharpening filter, and the kernel IS NOT separable. I tried redesigning my filter to be separable, but I cannot achieve the desired results. Any thoughts? Below is the function I use for my code:

Mat& adaptive_convolution(Mat& img)
{

    fstream in("kernel.txt");
    string line;

    float v[9][9];
    int i = 0, k = 0;

    while (getline(in, line))
    {
        float value;
        int k = 0;
        stringstream ss(line);

        while (ss >> value)
        {
            v[i][k] = value;
            ++k;
        }
        ++i;
    }


    clock_t init, end;
    double minVal;
    double maxVal;
    Point minLoc;
    Point maxLoc;

    int pad_fact = 4;
    int top, left, bottom, right;

    Mat new_image = img;
    top = pad_fact; bottom = pad_fact;
    left = pad_fact; right = pad_fact;

    copyMakeBorder(img, new_image, top, bottom, left, right, BORDER_CONSTANT, 0);

    minMaxLoc(img, &minVal, &maxVal, &minLoc, &maxLoc);
    new_image / 2^8;
    init = clock();
    double temp = 0;

    for (int i = pad_fact; i < img.rows + pad_fact; i++)
    {
        for (int j = pad_fact; j < img.cols + pad_fact; j++)
        {
            for (int ii = -pad_fact; ii <= pad_fact; ii++)
            {
                for (int jj = -pad_fact; jj <= pad_fact; jj++)
                {
                    //temp = double(v[ii + 2*pad_fact][jj + 2*pad_fact]); 
                    temp = temp + double(v[ii + pad_fact][jj + pad_fact] * float(new_image.at<uchar>(i - jj, j - ii)));
                    //temp = double(new_image.at<uchar>(i - jj, j - ii));
                }
            }
            if (temp > maxVal)
            {
                temp = maxVal;
            }
            else
            {
                if (temp < minVal)
                {
                    temp = minVal;
                }
            }
            new_image.at<uchar>(i, j) = temp;
            temp = 0;
        }
    }



    img = new_image;
    end = clock();
    cout << float(end - init)/1000 << endl;
    return img;
}

EDIT:

I was able to speed up the convolution in a python script I am using to about 0.2 seconds using Numba. I still need to see this kind of improvement using c++. Am I being help back by using opencv?

import numba as nb
import numpy as np

@nb.autojit
def custom_convolve(image,kernel,pad_fact):
    pad_fact = int(pad_fact)
    filt_im = np.zeros(image.shape)
    rows = image.shape[0]
    columns = image.shape[1]
    glob_max = np.max(image)
    glob_min = np.min(image)

    for x in range(pad_fact,columns-pad_fact,1):
        for y in range(pad_fact,rows-pad_fact,1):
            pix_sum = 0
            for k in range(-pad_fact,pad_fact,1):
                for j in range(-pad_fact,pad_fact,1):
                    pix_sum = pix_sum + kernel[k+pad_fact,j+pad_fact]*image[y-j,x-k]

            if pix_sum > glob_max:
                pix_sum = glob_max
            elif pix_sum < glob_min:
                pix_sum = glob_min

            filt_im[y,x] = pix_sum
    return filt_im

Most of the basic OpenCV implementations are using SSE function which allows two process additon etc. parallel by using 128 bit variable. Another trick is if the filter kernel is separable and can be composed as follows:

K = D * D'

where * denotes the convolutation operator and D is a vector eg [1 2 1], and K the final kernel. Than you can substitute the filtering of an image A to the image B:

B = A * K;

with

B = A * D (*) (A' * D)'

here (*) denotes the pixelwise multiplication, A' denotes the transposed image ie ' the transpose sign.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM