简体   繁体   中英

Fast vectorized pixel-wise operations on images

I want to measure the similarity degree between two grayscale same sized images using mean square error. I can't use any framework which is not a part of macOS SDK(eg OpenCV, Eigen). Simple realization of this algorithm without vectorization looks like this:

vImage_Buffer imgA;
vImage_Buffer imgB;

NSUInteger mse = 0;

unsigned char *pxlsA = (unsigned char *)imgA.data;
unsigned char *pxlsB = (unsigned char *)imgB.data;

for (size_t i = 0; i < imgA.height * imgA.width; ++i) {
    NSUInteger d = pxlsA[i] - pxlsB[i]);
    mse += d * d;
}

Is there some way to do this without loop, in more vectorized way? Maybe something like:

mse = ((imgA - imgB) ^ 2).sum();

The answer to this question is stored in vDSP library, which is part of macOS SDK. https://developer.apple.com/documentation/accelerate/vdsp

vDSP - Perform basic arithmetic operations and common digital signal processing routines on large vectors.

In my situation I have not really big vectors, but still.

Firstly, you need to convert unsigned char * to float * , and btw it is a significant moment, I don't know how to do this not in loop. Then you need two vDSP function: vDSP_vsbsbm and vDSP_sve .

vDSP_vsbsm - Multiplies the difference of two single-precision vectors by a second difference of two single-precision vectors.

vDSP_sve - Calculates the sum of values in a single-precision vector.

So the final code looks like that:

float *fpxlsA = (float *)malloc(imgA.height * imgA.width * sizeof(float));
float *fpxlsB = (float *)malloc(imgB.height * imgB.width * sizeof(float));
float *output = (float *)malloc(imgB.height * imgB.width * sizeof(float));

for (size_t i = 0; i < imgA.height * imgA.width; ++i) {
    fpxlsA[i] = (float)(pxlsA[i]);
    fpxlsB[i] = (float)(pxlsB[i]);
}    

vDSP_vsbsbm(fpxlsA, 1, fpxlsB, 1, fpxlsA, 1, fpxlsB, 1, output, 1, imgA.height * imgB.width);
float sum;
vDSP_sve(output, 1, &sum, imgA.height * imgB.width);

free(output);
free(fpxlsA);
free(fpxlsB);

So, this code did exactly what I wanted and in a more vectorized form. But the result isn't good enough. Comparing performances of the loop approach and vDSP approach, vDSP is two times faster if there isn't any additional memory allocation. But in reality, where additional memory allocation takes place, loop approach is slightly faster.

这似乎是 Mac OS 的一部分: https : //developer.apple.com/documentation/accelerate

Nice and fast using pointer arithmetic way to loop that would be as follows ...

int d;

size_t i = imgA.height * imgA.width;

while ( i -- )
{
  d = ( int )(*pxlsA++) - ( int )(*pxlsB++);
  mse += d * d;
}

EDIT

Ooops since those are unsigned char's and since we calculate the difference we need to use signed integers to do so.

And another edit - must use pxls... here, don't know what img... is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM