简体   繁体   中英

faster way to erode/dilate images

I'm making a script thats does some mathemagical morphology on images (mainly gis rasters). Now, I've implemented erosion and dilation, with opening/closing with reconstruction still on the TODO but thats not the subject here.

My implementation is very simple with nested loops, which I tried on a 10900x10900 raster and it took an absurdly long amount of time to finish, obviously.

Before I continue with other operations, I'd like to know if theres a faster way to do this?

My implementation:

def erode(image, S):
    (m, n) = image.shape
    buffer = np.full((m, n), 0).astype(np.float64)
    
    for i in range(S, m - S):
        for j in range(S, n - S):
            buffer[i, j] = np.min(image[i - S: i + S + 1, j - S: j + S + 1]) #dilation is just np.max()

    return buffer

在此处输入图像描述

I've heard about vectorization but I'm not quite sure I understand it too well. Any advice or pointers are appreciated. Also I am aware that opencv has these morphological operations, but I want to implement my own to learn about them.

The question here is do you want a more efficient implementation because you want to learn about numpy or do you want a more efficient algorithm.

I think there are two obvious things that could be improved with your approach. One is you want to avoid looping on the python level because that is slow. The other is that your taking a maximum of overlapping parts of arrays and you can make it more efficient if you reuse all the effort you put in finding the last maximum.

I will illustrate that with 1d implementations of erosion.

Baseline for comparison

Here is basically your implementation just a 1d version:

def erode(image, S):
    n = image.shape[0]
    buffer = np.full(n, 0).astype(np.float64)
    for i in range(S, n - S):
        buffer[i] = np.min(image[i - S: i + S + 1]) #dilation is just np.max()
    return buffer

You can make this faster using stride_tricks/sliding_window_view . Ie by avoiding the loops and doing that at the numpy level.

Faster Implementation

np.lib.stride_tricks.sliding_window_view(arr,2*S+1).min(1)

Notice that it's not quite doing the same since it only starts calculating values once there are 2S+1 values to take the maximum of. But for this illustration I will ignore this problem.

Faster Algorithm

A completely different approach would be to not start calculating the min from scratch but keeping the values ordered and only adding one and removing one when considering the next window one to the right.

Here is a ruff implementation of that:

def smart_erode(arr, m):
    n = arr.shape[0]
    sd = SortedDict()
    for new in arr[:m]:
        if new in sd:
            sd[new] += 1
        else:
            sd[new] = 1
    for to_remove,new in zip(arr[:-m+1],arr[m:]):
        yield sd.keys()[0]
        if new in sd:
            sd[new] += 1
        else:
            sd[new] = 1
        if sd[to_remove] > 1:
            sd[to_remove] -= 1
        else:
            sd.pop(to_remove)
    yield sd.keys()[0]

Notice that an ordered set wouldn't work and an ordered list would have to have a way to remove just one element with a specific value sind you could have repeated values in your array. I am using an ordered dict to store the amount of items present for a value.

A Ruff Benchmark

I want to illustrate how the 3 implementations compare for different window sizes. So I am testing them with an array of 10^5 random integers for different window sizes ranging from 10^3 to 10^4.

arr = np.random.randint(0,10**5,10**5)
sliding_window_times = []
op_times = []
better_alg_times = []

for m in np.linspace(0,10**4,11)[1:].astype('int'):
    x = %timeit -o -n 1 -r 1  np.lib.stride_tricks.sliding_window_view(arr,2*m+1).min(1)
    sliding_window_times.append(x.best)
    x = %timeit -o -n 1 -r 1  erode(arr,m)
    op_times.append(x.best)
    x = %timeit -o -n 1 -r 1  tuple(smart_erode(arr,2*m+1))
    better_alg_times.append(x.best)
    print("")

pd.DataFrame({"Baseline Comparison":op_times,
              'Faster Implementation':sliding_window_times, 
              'Faster Algorithm':better_alg_times,
              },
            index = np.linspace(0,10**4,11)[1:].astype('int')
            ).plot.bar()

在此处输入图像描述

Notice that for very small window sizes the raw power of the numpy implementation wins out but very quickly the amount of work we are saving by not calculating the min from scratch is more important.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM