I want to mask (set to 0) some pixels in a very large image. I have a mask array with values 0 or 1 to specify which image pixels are to be muted. This mask can be typically produced using an eraser tool in the image viewer widget in my application. It is later applied to the data when doing image analysis by multiplying mask pixels with image pixels.
For performance reasons, the mask has a smaller size than the full resolution image. A mask pixel will typically cover 4*4 or 8*8 pixels of the full resolution image.
I want to optimize the performance of the masking function using numba. My problem is that whenever I try to paralellize the algorithm, my performances are degraded.
Here are my tests. The image size is typical of my real data.
import numpy
import numba
def apply_binned_mask_numpy(image, mask, binning_factor):
new_image = image.copy()
for i in range(binning_factor):
for j in range(binning_factor):
image_slice = new_image[i::binning_factor, j::binning_factor]
actual_mask = mask[:image_slice.shape[0], :image_slice.shape[1]]
image_slice *= actual_mask
return new_image
@numba.jit
def apply_binned_mask_numba(image, mask, binning_factor):
new_image = image.copy()
for i in range(binning_factor):
for j in range(binning_factor):
image_slice = new_image[i::binning_factor, j::binning_factor]
actual_mask = mask[:image_slice.shape[0], :image_slice.shape[1]]
image_slice *= actual_mask
return new_image
@numba.njit(parallel=True)
def apply_binned_mask_numba_parallel(image, mask, binning_factor):
new_image = image.copy()
for i in numba.prange(binning_factor):
for j in range(binning_factor):
image_slice = new_image[i::binning_factor, j::binning_factor]
actual_mask = mask[:image_slice.shape[0], :image_slice.shape[1]]
image_slice *= actual_mask
return new_image
if __name__ == '__main__':
import time
a = numpy.arange(7997*7994).reshape((7997, 7994))
# mask with values 0 or 1
mask = numpy.random.randint(0, 2, (1000, 1000), dtype=numpy.uint8)
t0 = time.time()
b = apply_binned_mask_numpy(a, mask, 8)
print("numpy", time.time() - t0)
t0 = time.time()
c = apply_binned_mask_numba(a, mask, 8)
print("numba", time.time() - t0)
t0 = time.time()
d = apply_binned_mask_numba_parallel(a, mask, 8)
print("numba p", time.time() - t0)
assert numpy.array_equal(c, d)
This code produces the following results:
numpy 0.3541719913482666
numba 0.55484938621521
numba p 1.4546563625335693
I have tried variation of this more naive implementation, without significant speed-up:
@numba.njit(parallel=True)
def apply_binned_mask_numba_parallel(image, mask, binning_factor):
new_image = image.copy()
for k in numba.prange(mask.shape[0]):
for l in range(mask.shape[1]):
for i in range(binning_factor):
for j in range(binning_factor):
row_idx = k * binning_factor + i
col_idx = l * binning_factor + j
if row_idx >= image.shape[0] or col_idx > image.shape[1]:
continue
new_image[row_idx, col_idx] *= mask[k, l]
return new_image
It seems that I'm not getting any performance increase out of numba. Any idea what I'm doing wrong, here?
I forgot to take the compilation time into account. If I reuse my functions at least twice, the second run is faster.
if __name__ == '__main__':
import time
a = numpy.random.randint(0, 256, (15997, 7994), dtype=numpy.uint8)
# mask with values 0 or 1
mask = numpy.random.randint(0, 2, (2000, 1000), dtype=numpy.uint8)
binning = 8
t0 = time.time()
b = apply_binned_mask_numpy(a, mask, binning)
print("numpy", time.time() - t0)
t0 = time.time()
c = apply_binned_mask_numba(a, mask, binning)
print("numba", time.time() - t0)
t0 = time.time()
c = apply_binned_mask_numba(a, mask, binning)
print("numba run2", time.time() - t0)
t0 = time.time()
d = apply_binned_mask_numba_parallel(a, mask, binning)
print("numba p", time.time() - t0)
t0 = time.time()
d = apply_binned_mask_numba_parallel(a, mask, binning)
print("numba p run2", time.time() - t0)
assert numpy.array_equal(c, d)
The result is now
numpy 0.3582308292388916
numba 0.5374748706817627
numba run2 0.21624493598937988
numba p 1.5098681449890137
numba p run2 0.16219329833984375
It makes more sense, but I'm still disappointed by the parallel version.
Replacing the parallel code with the "naive" version still degrades the performances:
@numba.njit(parallel=True)
def apply_binned_mask_numba_parallel(image, mask, binning_factor):
new_image = image.copy()
for i in numba.prange(image.size):
row = i // image.shape[1]
col = i - row * image.shape[1]
mask_row = row // binning_factor
mask_col = col // binning_factor
new_image[row, col] *= mask[mask_row, mask_col]
return new_image
Results in
numba p run2 0.6018044948577881
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.