making python code faster using numba, cache or any other optimization

Question

I have defined the following function

def laplacian_2D_array(func_2D):
    func_2nd_derv_x_fin2D = np.zeros((N,N))
    for j in range (0,N):
        func_CD2_x_list = []
        for i in range (0,N):
            value = func_2D[i][j] #func_2D is a NxN matrix
            func_CD2_x_list.append(value)
        func_CD2_x_array = np.array (func_CD2_x_list)[np.newaxis]
        func_2nd_derv_x = matrix_R @ (np.transpose(func_CD2_x_array))
        func_2nd_derv_x_fin2D[j] = np.transpose (np.reshape(func_2nd_derv_x,[N,]))

    func_2nd_derv_y_fin2D = np.zeros((N,N))
    for i in range (0,N):
        func_CD2_y_list = []
        for j in range (0,N):
            value = func_2D[i][j]
            func_CD2_y_list.append(value)
        func_CD2_y_array = np.array (func_CD2_y_list)[np.newaxis]
        func_2nd_derv_y = matrix_S @ (np.transpose(func_CD2_y_array))
        func_2nd_derv_y_fin2D[i] = (np.reshape(func_2nd_derv_y,[N,]))    
    return (np.add(func_2nd_derv_x_fin2D, func_2nd_derv_y_fin2D))

In the above code a 2D matrix func_2D each j row is extracted as column vector and multiplied with matrix_R and stored in jth column of func_2nd_derv_x_fin2D and similarly of 2nd block of code and finally function returns the addition of func_2nd_derv_x_fin2D and func_2nd_derv_y_fin2D

In this N = 401 and matrix_S and matrix_R are also NXN matrices. This function is being called multiple times in a while loop and execution of single iteration is taking a lot of time. I have tried @njit to make it faster but I am not successful in doing so and getting errors. I have also tried using cache .

How can we optimise for lists and arrays in this and what are other ways to optimize the defined function?

I am showing the code where the function is used.

while (time<timemax):

    #Analytical Solution----------------------------------------------------------------------------
    exact_time = time/a_sec
    omega_t = np.zeros((N,N))
    psi_t = np.zeros((N,N))
    for i in range (0,N):
        for j in range (0,N):
            psi_t[i][j] = np.sin(x_list[i]) * np.sin(y_list[j]) * np.exp((-2*exact_time)/Re)
            omega_t[i][j] = 2*np.sin (x_list[i]) * np.sin(y_list[j]) * np.exp((-2*exact_time)/Re)
.
.
.
.
.
.
.
    # BiCGSTAB algo

    x0 = psi_0 #initial guess--> psi of previous time step
    r0 = omega_0 - laplacian_2D_array(x0) # r0 = b-Ax0
    r0_hat = r0
    rho_0 = 1
    alpha = 1
    w0 = 1
    v0 = np.zeros((N,N))
    P_0 = np.zeros((N,N))
    tol = 10 ** (-7)
    iteration = 0
    while ((np.max(np.abs(laplacian_2D_array(x0) - omega_0))) < tol):
        rho_prev = rho_0
        rho = np.dot ((np.reshape(r0_hat,(N**2,1))),(np.reshape(r0,(N**2,1))))
        beta = (rho/rho_prev) * (alpha/w0)
        P_0 = r0 + beta * (P_0 - w0 * v0)
        v0 = laplacian_2D_array(P_0)
        alpha = rho / np.dot ((np.reshape(r0_hat,(N**2,1))),(np.reshape(v0,(N**2,1)))) 
        s = r0 - alpha * v0
        t = laplacian_2D_array(s)

Any suggestions to fasten code is highly appreciable.

Answer 1

It turns out the complicated code of laplacian_2D_array can be simplified as the following implementation:

def laplacian_2D_array(func_2D):
    return (matrix_R @ func_2D + matrix_S @ func_2D.T).T

This is 63 times faster on my machine on random matrices based on your provided inputs. Most of the time should be spent in the matrix multiplication (performed very efficiently in parallel if the installation of Numpy/Python/BLAS is done correctly on the target platform).

making python code faster using numba, cache or any other optimization

Question

1 answers

solution1
2 ACCPTED 2022-03-28 20:30:14

making python code faster using numba, cache or any other optimization

Question

1 answers

solution1 2 ACCPTED 2022-03-28 20:30:14

solution1
2 ACCPTED 2022-03-28 20:30:14