Python - Optimize Lambda with Numpy Operations

Question

I'm having difficult time optimizing the following calculation;

Inner_diff_grp  = np.var(list(map(lambda x : np.percentile(winw2_grp,x[0]) - np.percentile(winw2_grp,x[1])  ,[(i+7,i) for i in range(0,98,7)])))

'winw2_grp' is a small sized image array (say 5x5). I'm looping though the image to find percentile values at every 7th step and then calculating the variance of those values.

Total images in the loop are around 100,000. Earlier I was using standard loops but now I've changed that to Pandas.apply that seems to be performing better and throughput is around 150 iteration/sec now - which still means more than 10mins of runtime.

Appart from trying out pooling to exploit all CPUs, is there any way to optimize this calculation?

Answer 1

So as per the sugession from @Ehsan, I enclosed the calculation in a separate function with the numba decorator and that was it. I deliberately removed the lambda because I wanted to try out other optimizations (Parallel execution) - so its not part of the strategy but rather a WIP.

@nb.jit(nopython = True, fastmath=True)
def numba_perc_calc(win):
    arr = [0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98]
    perc = np.percentile(win,arr)
    dif = np.diff(perc )
    var_of_percs = np.var(dif )
    return var_of_percs

The result of a piece on a smaller test-set is followed.

Python - Optimize Lambda with Numpy Operations

Question

1 answers

solution1
0 ACCPTED 2020-06-23 16:54:05

Python - Optimize Lambda with Numpy Operations

Question

1 answers

solution1 0 ACCPTED 2020-06-23 16:54:05

solution1
0 ACCPTED 2020-06-23 16:54:05