没有python对象的cython中的高效矩阵运算

Question

I'm writing a function that I'd like to translate to C as much as possible using Cython.我正在编写一个函数，我想尽可能多地使用 Cython 将其转换为 C。 To do this, I'll need to use linear algebra operations.为此，我需要使用线性代数运算。 Here is my function.这是我的功能。 EDIT: The lesson I've learned is to try to take care of linear algebra outside of loops, which I've largely been able to do.编辑：我学到的教训是尝试处理循环之外的线性代数，这在很大程度上是我能够做到的。 Otherwise, resort to wrapping LAPACK/BLAS or writing my own functions.否则，求助于包装 LAPACK/BLAS 或编写我自己的函数。

import numpy as np
from scipy.stats import multivariate_normal as mv
import itertools

def llf(data, rho, mu, sigma, A, V, n):
    '''evaluate likelihood by guass-hermite quadrature

    Parameters
    ----------
    data : array
        N x J matrix, columns are measurements
    rho : array
        length L vector of weights for mixture of normals
    mu : array
        L x K vector of means of mixture of normals
    sigma : array
        K x L*K matrix of variance matrices for mixture of normals
    A : array
        J x (K + 1) matrix of loadings
    V : array
        J x J variance matrix of measurement errors
    n : int
        number of sample points for quadrature
    '''

    N = data.shape[0]
    L, K = mu.shape

    # getting weights and sample points for approximating integral
    v, w = np.polynomial.hermite.hermgauss(n)

    totllf = 0
    for i in range(N):
        M_i = data[i, :]
        totllf_i = 0
        for l in range(L):
            rho_l = rho[l]
            sigma_l = sigma[:, K*l:K*(l+1)]
            mu_l = mu[l, :]
            chol_l = np.linalg.cholesky(sigma_l)
            for ix in itertools.product(*(list(range(n)) for k in range(K))):
               wt =  np.prod(w[list(ix)])
               pt = np.sqrt(2)*chol_l.dot(v[list(ix)]) + mu_l
               totllf_i += wt*rho_l*mv.pdf(M_i, A[:, 0] + A[:, 1:].dot(pt), V)

        totllf += np.log(totllf_i)

    return totllf

In order to accomplish this I'd need to have functions for matrix multiplication, transpose, determinant, matrix inverse and cholesky decompostion.为了实现这一点，我需要具有矩阵乘法、转置、行列式、矩阵求逆和 cholesky 分解的函数。 I've seen some posts on using BLAS functions, but its really unclear to me how to use these.我看过一些关于使用BLAS函数的帖子，但我真的不清楚如何使用这些。

EDIT 04/29/18编辑 04/29/18

As suggested, I've taken a memory view approach and initialized everything prior to the loop.按照建议，我采用了内存视图方法并在循环之前初始化了所有内容。 My new function is written as我的新函数写成

def llf_c(double[:, ::1] data, double[::1] rho, double[:, ::1] mu, 
                double[:, ::1] sigma, double[:, ::1] A, double[:, ::1] V, int n):
    '''evaluate likelihood by guass-hermite quadrature

    Parameters
    ----------
    data : array
        N x J matrix, columns are measurements
    rho : array
        length L vector of weights for mixture of normals
    mu : array
        L x K vector of means of mixture of normals
    sigma : array
        K x L*K matrix of variance matrices for mixture of normals
    A : array
        J x (K + 1) matrix of loadings
    V : array
        J x J variance matrix of measurement errors
    n : int
        number of sample points for quadrature
    '''

    cdef Py_ssize_t N = data.shape[0], J = data.shape[1], L = mu.shape[0], K = mu.shape[1]

    # initializing indexing variables
    cdef Py_ssize_t i, l, j, k

    # getting weights and sample points for approximating integral
    v_a, w_a = np.polynomial.hermite.hermgauss(n)
    cdef double[::1] v = v_a
    cdef double[::1] w = w_a
    cdef double[::1] v_ix = np.zeros(K, dtype=np.float)

    # initializing memory views for cholesky decomposition of sigma matrices
    sigma_chol_a = np.zeros((K, L*K), dtype=np.float)
    for l in range(L):
        sigma_chol_a[:, K*l:K*(l+1)] = np.linalg.cholesky(sigma[:, K*l:K*(l+1)])

    cdef double[:, ::1] sigma_chol = sigma_chol_a

    # intializing V inverse and determinant
    cdef double[:, ::1] V_inv = np.linalg.inv(V)
    cdef double V_det = np.linalg.det(V)

    # initializing memoryviews for work matrices
    cdef double[::1] work_K = np.zeros(K, dtype=np.float)
    cdef double[::1] work_J = np.zeros(J, dtype=np.float)

    # initializing memoryview for quadrature points
    cdef double[::1] pt = np.zeros(K, dtype=np.float)

    # initializing memorview for means for multivariate normal
    cdef double[::1] loc = np.zeros(J, dtype=np.float)

    # initializing values for loop
    cdef double[::1] totllf = np.zeros(N, dtype=np.float)
    cdef double wt, pdf_init = 1./sqrt(((2*pi)**J)*V_det)
    cdef int[:, ::1] ix_vals = np.vstack(itertools.product(*(list(range(n)) for k in range(K)))).astype(np.int32)
    cdef Py_ssize_t ix_len = ix_vals.shape[0]

    for ix_row in range(ix_len):

        ix = ix_vals[ix_row]

        # weights and points for quadrature
        wt = 1.
        for k in range(K):
            wt *= w[ix[k]]
            v_ix[k] = v[ix[k]]

        for l in range(L):

            # change of variables
            dotmv_c(sigma_chol[:, K*l:K*(l+1)], v_ix, work_K)
            for k in range(K):
                pt[k] = sqrt(2)*work_K[k]
            addvv_c(pt, mu[l, :], pt)

            for i in range(N):

                # generating demeaned vector for multivariate normal pdf
                dotmv_c(A[:, 1:], pt, work_J)
                addvv_c(A[:, 0], work_J, work_J)
                for j in range(J):
                    loc[j] = data[i, j] - work_J[j]

                # performing matrix products in exponential
                # print(wt, rho[l], np.asarray(work_J))
                dotvm_c(loc, V_inv, work_J)
                totllf[i] += wt*rho[l]*pdf_init*exp(-0.5*dotvv_c(work_J, loc))

    return np.log(np.asarray(totllf)).sum()

The dotvm_c , dotmv_c and addvv_c are functions that performance matrix multiplication of vector and matrix, matrix and vector, and elementwise addition of two vectors. dotvm_c 、 dotmv_c和addvv_c是执行向量与矩阵、矩阵与向量的矩阵乘法以及两个向量的元素相加的函数。 I've written these in Cython as well, but am not including for brevity.我也用 Cython 写过这些，但为了简洁起见不包括在内。 I'm no longer wrapping any LAPACK functions as I do all other linear algebra prior to the loop using numpy.我不再包装任何 LAPACK 函数，因为我在使用 numpy 循环之前执行所有其他线性代数。 I have a few more questions.我还有几个问题。 Why do I still have yellow in the loop?为什么我的循环中仍然有黄色？ (See profile below). （见下面的简介）。 I thought everything should be in C now.我认为现在一切都应该在 C 中。 Also if you have any other suggestions based on new implementation, please let me know.另外，如果您有任何其他基于新实现的建议，请告诉我。

For instance, in line 221, I get this message upon compiling: "Index should be typed for more efficient access."例如，在第 221 行，我在编译时收到此消息：“应键入索引以提高访问效率。” But I thought I typed the index k.但我以为我输入了索引 k。 Also, since addvv_c shows up in yellow, I'll show you it's definition below.另外，由于addvv_c显示为黄色，我将addvv_c您展示它的定义。

cpdef void addvv_c(double[:] a, double[:] b, double[:] out):
    '''add two vectors elementwise
    '''
    cdef Py_ssize_t i, n = a.shape[0]

    for i in range(n):
        out[i] = a[i] + b[i]

Answer 1

A few of small points with respect to your optimized Cython/BLAS functions:关于优化的 Cython/BLAS 函数的一些小要点：

ipiv_a = np.zeros(n).astype(np.int32)
cdef int[::1] ipiv = ipiv_a

can have two easy improvements: it doesn't has to go through a temporary variable, and you can directly create an array of type np.int32 rather than create an array of a different type then casting:可以有两个简单的改进：它不必通过临时变量，您可以直接创建一个np.int32类型的数组，而不是创建一个不同类型的数组然后进行转换：

 cdef int[::1] ipiv = np.zeros(n,dtype=np.int32)

Simiarly, in both fucntions you can initialize B in a fewer steps by doing同样，在这两个功能中，您可以通过执行以下操作以更少的步骤初始化B

cdef double[:, ::1] B = A.copy()

rather than creating a zero array and copying而不是创建零数组并复制

The second (more significant) change would be to use C arrays for temporary variables such as the Fortran workspaces.第二个（更重要的）变化是将 C 数组用于临时变量，例如 Fortran 工作区。 I'd still keep things like return values as numpy arrays since the reference counting and ability to send them back to Python is very useful.我仍然会将返回值之类的东西保留为 numpy 数组，因为引用计数和将它们发送回 Python 的能力非常有用。

 cdef double* work = <double*>malloc(n*n*sizeof(double))
 try:
     # rest of function
 finally:
     free(work)

You need to cimport malloc and free from libc.stdlib .您需要cimport malloc和free来自libc.stdlib 。 The try: ... finally: ensures the memory is correctly deallocated. try: ... finally:确保正确释放内存。 Don't go too over the top with this - for example, if it isn't obvious where to deallocate a C array then just use numpy.不要过度使用这个 - 例如，如果释放 C 数组的位置不明显，那么只需使用 numpy.

The final option to look at is to not have a return value but to modify an input:要查看的最后一个选项是没有返回值而是修改输入：

cdef void inv_c(double[:,::1] A, double[:,::1] B):
    # check that A and B are the right size, then just write into B
    # ...

The advantage of this is that if you need to call this in a loop with identical sized inputs then you only need to do one allocation for the whole loop.这样做的好处是，如果您需要在具有相同大小输入的循环中调用它，那么您只需为整个循环进行一次分配。 You could extend this to include the working arrays too, although that might be a little more complicated.您也可以将其扩展为包括工作数组，尽管这可能会更复杂一些。

没有python对象的cython中的高效矩阵运算

问题描述

1 个解决方案

解决方案1
3 已采纳

没有python对象的cython中的高效矩阵运算

问题描述

1 个解决方案

解决方案1 3 已采纳

解决方案1
3 已采纳