使用Cython提高Python函数的性能

Question

Aim 目标

I'm trying to speed up my Python program with Cython. 我正在尝试通过Cython加快我的Python程序。 The code I'm writing is an attempt at the Forward algorithm, used to recursively and efficiently calculate probabilities of long sequences in a Hidden Markov Model (HMM). 我正在编写的代码是对Forward算法的尝试，用于递归和有效地计算Hidden Markov模型（HMM）中长序列的概率。 This problem is usually referred to as the Evaluation Problem. 此问题通常称为评估问题。

Python Code Python代码

In a file called hmm.py 在名为hmm.py的文件中

import numpy
import pandas

class HMM():
    '''    
    args:
        O:
            observation sequence. A list of 'H's or 'T's

        X:
            state sequence. 'S','M' or 'L's

        A:
            transition matrix, N by N

        B:
            Emission matrix, M by N

        M:
            Number of possibilities in emission matrix

        pi:
            initial transition matrix

        N:
            Number of states

        T:
            length of the observation sequence

        Q:
            possible hidden states (Xs)

        V:
            possible observations (Os)

    '''
    def __init__(self,A,B,pi,O,X):
        self.A=A
        self.N=self.A.shape[0]
        self.B=B
        self.M=self.B.shape[1]
        self.pi=pi
        self.O=O
        self.T=len(O)
        self.Q=list(self.A.index)
        self.V=list(self.B.keys())
        self.X=X


    def evaluate(self):
        '''
        Solve the evaluation problem for HMMs 
        by implementing the forward algorithm
        '''
        c0=0
        ct=numpy.zeros(self.T)
        alpha= numpy.zeros((self.T,self.N))

        ## compute alpha[0]
        for i in range(self.N):
            pi0=self.pi[self.Q[i]]
            bi0=self.B.loc[self.Q[i]][self.O[0]]
            alpha[0,i]=pi0*bi0
            c0+=alpha[0,i]
            ct[0]=alpha[0,i]
        ct[0]=1/ct[0]#[0]
        alpha=alpha*ct[0]
        for t in range(1,self.T):
            for i in range(self.N):
                for j in range(self.N):
                    aji= self.A[self.Q[j]][self.Q[i]]
                    alpha[t,j]= alpha[t-1,j]*aji
                ct[t]=ct[t]+alpha[t,i]
            ct[t]=1/ct[t]
            alpha=alpha*ct[t]
        return (alpha,ct)

This code can be called with: 可以使用以下代码调用此代码：

if __name__=='__main__':
    A=[[0.7,0.3],[0.4,0.6]]
    A= numpy.matrix(A)
    A=pandas.DataFrame(A,columns=['H','C'],index=['H','C'])
    '''
    three types of emmission, small s, medium m and large l
    '''
    B=[[0.1,0.4,0.5],[0.7,0.2,0.1]]
    B=numpy.matrix(B)
    B=pandas.DataFrame(B,columns=['S','M','L'],index=['H','C'])
    '''
    initial probabilities for state, H and C
    '''
    pi=[0.6,0.4]
    pi=numpy.matrix(pi)
    pi=pandas.DataFrame(pi,columns=['H','C'])
    O=(0,1,0,2)
    O=('S','M','S','L')
    X=('H','H','C','C')
    H=HMM(A,B,pi,O,X)
    print H.evaluate()

When using %timeit I get this output with pure python 当使用%timeit我得到的输出是纯python

Compilation with Cython 用Cython编译

I then placed the evaluate function (rather than the whole class) in a new file hmm.pyx extension: 然后，将evaluate函数（而不是整个类）放置在新文件hmm.pyx扩展中：

import numpy 
cimport numpy

cpdef evaluate_compiled(A,B,pi,O,X):
    '''
    Solve the evaluation problem for HMMs 
    by implementing the forward algorithm
    '''
    T=len(O)
    N=len(list(set(X)))
    Q=list(set(X))
    V=list(set(O))

    c0=0
    ct=numpy.zeros(T)
    alpha= numpy.zeros((T,N))

    ## compute alpha[0]
    for i in range(N):
        pi0=pi[Q[i]]
        bi0=B.loc[Q[i]][O[0]]
        alpha[0,i]=pi0*bi0
        c0+=alpha[0,i]
        ct[0]=alpha[0,i]
    ct[0]=1/ct[0]#[0]
    alpha=alpha*ct[0]
    for t in range(1,T):
        for i in range(N):
            for j in range(N):
                aji= A[Q[j]][Q[i]]
                alpha[t,j]= alpha[t-1,j]*aji
            ct[t]=ct[t]+alpha[t,i]
        ct[t]=1/ct[t]
        alpha=alpha*ct[t]
    return (alpha,ct)

In setup.py : 在setup.py ：

try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension

from Cython.Distutils import build_ext
import numpy

setup(cmdclass={'build_ext':build_ext},
      ext_modules=[Extension('hmm',
                             sources=['HMMCluster/hmm.pyx'], #File is in a directory called HMMCluster
                                     include_dirs=[numpy.get_include()])]   )

Now after compilation, I can use: 现在，在编译之后，我可以使用：

from hmm import evaluate_compiled

And under the __main__ block above I can use, inplace of evaluate : 而根据__main__块以上，我可以使用，地质储量evaluate ：

print evaluate_compiled(A,B,pi,O,X)

and with %timeit : 和%timeit ：

As you can see, without changing the code I have a ~3 fold improvement in speed. 如您所见，在不更改代码的情况下，速度提高了约3倍。 However, all the doc's I have read suggest the lack of speed in Python is due to dynamically inferring variable types. 但是，我读过的所有文档都表明Python中缺乏速度是由于动态推断变量类型所致。 Therefore, in principle, I can declare variable types and speed things up further. 因此，原则上，我可以声明变量类型并进一步加快处理速度。

Cython with type declaration 带有类型声明的Cython

Now, the last function on this post is the same algorithm again but with type declaration 现在，本文的最后一个函数再次是相同的算法，但带有类型声明

cpdef evaluate_compiled_with_type_declaration(A,B,pi,O,X):
    cdef int t,i,j
    cdef int T  = len(O)
    cdef int N  = len(list(set(X)))
    cdef list Q = list(set(X))
    cdef list V = list(set(O))
    cdef float c0 = 0
#    cdef numpy.ndarray ct = numpy.zeros(T,dtype=double) ## this caused compilation to fail
    ct=numpy.zeros(T)
    alpha= numpy.zeros((T,N))
    for i in range(N):
        pi0=pi[Q[i]]
        bi0=B.loc[Q[i]][O[0]]
        alpha[0,i]=pi0*bi0
        c0+=alpha[0,i]
        ct[0]=alpha[0,i]
    ct[0]=1/ct[0]#[0]
    alpha=alpha*ct[0]
    for t in range(1,T):
        for i in range(N):
            for j in range(N):
                aji= A[Q[j]][Q[i]]
                alpha[t,j]= alpha[t-1,j]*aji
            ct[t]=ct[t]+alpha[t,i]
        ct[t]=1/ct[t]
        alpha=alpha*ct[t]
    return (alpha,ct)

After compilation and %timeit i get: 经过编译和％timeit我得到：

As you can see, the type declarations haven't made any further improvements to the code performance. 如您所见，类型声明并没有对代码性能做任何进一步的改进。 My question is: Can any further improvements be made to the speed of this code and if so, how do I do it? 我的问题是：是否可以对代码的速度做进一步的改进，如果可以，我该怎么做？

Edit Following suggestions in the comments I added the additional type declarations: 编辑以下注释中的建议，我添加了其他类型声明：

cdef float  pi0,bi0
cdef numpy.ndarray[numpy.float64_t, ndim=1] ct
cdef numpy.ndarray[numpy.float64_t, ndim=2] aij,alpha

and got this from %timeit : 并从%timeit ：

So again, still no real speedups even with declaring types. 同样，即使声明了类型，仍然没有真正的加速。

Answer 1

The "modern" way to use NumPy arrays in Cython is "Typed Memoryviews" http://docs.cython.org/en/latest/src/userguide/memoryviews.html 在Cython中使用NumPy数组的“现代”方法是“ Typed Memoryviews”（ http://docs.cython.org/en/latest/src/userguide/memoryviews.html）

The corresponding arguments must be declared as: 相应的参数必须声明为：

cpdef evaluate_compiled_with_type_declaration(double[:,:] A, double[:,:] B, double[:] pi, int[:] O, int[:] X):

(just guessing for types and shapes). （只是猜测类型和形状）。

They must be indexed directly as A[i,j] and not A[i][j] 它们必须直接索引为A[i,j]而不是A[i][j]

You can declare your result arrays and fill them in one go. 您可以声明结果数组并将其一次性填写。

    cdef double[:] ct=numpy.zeros(T)
    cdef double[:,:] alpha= numpy.zeros((T,N))

For optimization, see the compiler directives http://cython.readthedocs.io/en/latest/src/reference/compilation.html?highlight=cdivision#compiler-directives (specifically cdivision and boundscheck ) 要进行优化，请参见编译器指令http://cython.readthedocs.io/en/latest/src/reference/compilation.html?highlight=cdivision#compiler-directives （特别是cdivision和boundscheck ）

You should not convert your input data via list(set(...)) as the set loses ordering. 您不应该通过list(set(...))转换输入数据，因为set失去排序。

To check whether your code is "compiled", use cython -a on the file as suggested by Warren Weckesser 要检查您的代码是否被“编译”，请按照Warren Weckesser的建议在文件上使用cython -a

使用Cython提高Python函数的性能

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-02-12 20:54:53

使用Cython提高Python函数的性能

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-02-12 20:54:53

解决方案1
3 已采纳 2017-02-12 20:54:53