简体   繁体   中英

applying a binned mask to an image

I want to mask (set to 0) some pixels in a very large image. I have a mask array with values 0 or 1 to specify which image pixels are to be muted. This mask can be typically produced using an eraser tool in the image viewer widget in my application. It is later applied to the data when doing image analysis by multiplying mask pixels with image pixels.

For performance reasons, the mask has a smaller size than the full resolution image. A mask pixel will typically cover 4*4 or 8*8 pixels of the full resolution image.

I want to optimize the performance of the masking function using numba. My problem is that whenever I try to paralellize the algorithm, my performances are degraded.

Here are my tests. The image size is typical of my real data.

import numpy
import numba    

def apply_binned_mask_numpy(image, mask, binning_factor):
    new_image = image.copy()
    for i in range(binning_factor):
        for j in range(binning_factor):
            image_slice = new_image[i::binning_factor, j::binning_factor]
            actual_mask = mask[:image_slice.shape[0], :image_slice.shape[1]]
            image_slice *= actual_mask

    return new_image


@numba.jit
def apply_binned_mask_numba(image, mask, binning_factor):
    new_image = image.copy()
    for i in range(binning_factor):
        for j in range(binning_factor):
            image_slice = new_image[i::binning_factor, j::binning_factor]
            actual_mask = mask[:image_slice.shape[0], :image_slice.shape[1]]
            image_slice *= actual_mask

    return new_image


@numba.njit(parallel=True)
def apply_binned_mask_numba_parallel(image, mask, binning_factor):
    new_image = image.copy()
    for i in numba.prange(binning_factor):
        for j in range(binning_factor):
            image_slice = new_image[i::binning_factor, j::binning_factor]
            actual_mask = mask[:image_slice.shape[0], :image_slice.shape[1]]
            image_slice *= actual_mask

    return new_image


if __name__ == '__main__':
    import time
    a = numpy.arange(7997*7994).reshape((7997, 7994))
    # mask with values 0 or 1
    mask = numpy.random.randint(0, 2, (1000, 1000), dtype=numpy.uint8)

    t0 = time.time()
    b = apply_binned_mask_numpy(a, mask, 8)
    print("numpy", time.time() - t0)

    t0 = time.time()
    c = apply_binned_mask_numba(a, mask, 8)
    print("numba", time.time() - t0)

    t0 = time.time()
    d = apply_binned_mask_numba_parallel(a, mask, 8)
    print("numba p", time.time() - t0)
    assert numpy.array_equal(c, d)

This code produces the following results:

numpy 0.3541719913482666
numba 0.55484938621521
numba p 1.4546563625335693

I have tried variation of this more naive implementation, without significant speed-up:

@numba.njit(parallel=True)
def apply_binned_mask_numba_parallel(image, mask, binning_factor):
    new_image = image.copy()
    for k in numba.prange(mask.shape[0]):
        for l in range(mask.shape[1]):
            for i in range(binning_factor):
                for j in range(binning_factor):
                    row_idx = k * binning_factor + i
                    col_idx = l * binning_factor + j
                    if row_idx >= image.shape[0] or col_idx > image.shape[1]:
                        continue
                    new_image[row_idx, col_idx] *= mask[k, l]
    return new_image

It seems that I'm not getting any performance increase out of numba. Any idea what I'm doing wrong, here?

I forgot to take the compilation time into account. If I reuse my functions at least twice, the second run is faster.

if __name__ == '__main__':
    import time
    a = numpy.random.randint(0, 256, (15997, 7994), dtype=numpy.uint8)
    # mask with values 0 or 1
    mask = numpy.random.randint(0, 2, (2000, 1000), dtype=numpy.uint8)
    binning = 8

    t0 = time.time()
    b = apply_binned_mask_numpy(a, mask, binning)
    print("numpy", time.time() - t0)

    t0 = time.time()
    c = apply_binned_mask_numba(a, mask, binning)
    print("numba", time.time() - t0)

    t0 = time.time()
    c = apply_binned_mask_numba(a, mask, binning)
    print("numba run2", time.time() - t0)

    t0 = time.time()
    d = apply_binned_mask_numba_parallel(a, mask, binning)
    print("numba p", time.time() - t0)

    t0 = time.time()
    d = apply_binned_mask_numba_parallel(a, mask, binning)
    print("numba p run2", time.time() - t0)
    assert numpy.array_equal(c, d)

The result is now

numpy 0.3582308292388916
numba 0.5374748706817627
numba run2 0.21624493598937988
numba p 1.5098681449890137
numba p run2 0.16219329833984375

It makes more sense, but I'm still disappointed by the parallel version.

Replacing the parallel code with the "naive" version still degrades the performances:

@numba.njit(parallel=True)
def apply_binned_mask_numba_parallel(image, mask, binning_factor):
    new_image = image.copy()
    for i in numba.prange(image.size):
        row = i // image.shape[1]
        col = i - row * image.shape[1]
        mask_row = row // binning_factor
        mask_col = col // binning_factor
        new_image[row, col] *= mask[mask_row, mask_col]
    return new_image

Results in

numba p run2 0.6018044948577881

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM