cython numpy積累功能

Question

我需要實現一個函數來匯總具有可變節長度的數組元素。 所以，

a = np.arange(10)
section_lengths = np.array([3, 2, 4])
out = accumulate(a, section_lengths)
print out
array([  3.,   7.,  35.])

我嘗試在cython實現：

https://gist.github.com/2784725

為了性能，我比較了在section_lengths完全相同的情況下的純粹numpy解決方案：

LEN = 10000
b = np.ones(LEN, dtype=np.int) * 2000
a = np.arange(np.sum(b), dtype=np.double)
out = np.zeros(LEN, dtype=np.double)

%timeit np.sum(a.reshape(-1,2000), axis=1)
10 loops, best of 3: 25.1 ms per loop

%timeit accumulate.accumulate(a, b, out)
10 loops, best of 3: 64.6 ms per loop

你對改善表現有什么建議嗎？

Answer 1

您可以嘗試以下某些操作：

除了@cython.boundscheck(False)編譯器指令外，還可以嘗試添加@cython.wraparound(False)
在setup.py腳本中，嘗試添加一些優化標志：
ext_modules = [Extension("accumulate", ["accumulate.pyx"], extra_compile_args=["-O3",])]
看一下cython -a accumulate.pyx生成的.html文件，看看是否有部分缺少靜態類型或嚴重依賴Python C-API調用：
http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types
在方法的末尾添加一個return語句。 目前，它在i_el += 1緊密循環中進行了大量不必要的錯誤檢查。
不確定它是否會有所作為，但我傾向於使循環計數器cdef unsigned int而不僅僅是int

當section_lengths不相等時，你也可以將你的代碼與numpy進行比較，因為它可能需要的不僅僅是一個簡單的sum 。

Answer 2

在nest for循環中，更新out[i_bas]很慢，你可以創建一個臨時變量來執行准確，並在nest for循環結束時更新out[i_bas] 。 以下代碼將與numpy版本一樣快：

import numpy as np
cimport numpy as np

ctypedef np.int_t DTYPE_int_t
ctypedef np.double_t DTYPE_double_t

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def accumulate(
       np.ndarray[DTYPE_double_t, ndim=1] a not None,
       np.ndarray[DTYPE_int_t, ndim=1] section_lengths not None,
       np.ndarray[DTYPE_double_t, ndim=1] out not None,
       ):
    cdef int i_el, i_bas, sec_length, lenout
    cdef double tmp
    lenout = out.shape[0]
    i_el = 0
    for i_bas in range(lenout):
        tmp = 0
        for sec_length in range(section_lengths[i_bas]):
            tmp += a[i_el]
            i_el+=1
        out[i_bas] = tmp

cython numpy積累功能

問題描述

2 個解決方案

解決方案1
2 已采納 2012-05-25 00:33:48

解決方案2
1 2012-05-25 02:16:56

cython numpy積累功能

問題描述

2 個解決方案

解決方案1 2 已采納 2012-05-25 00:33:48

解決方案2 1 2012-05-25 02:16:56

解決方案1
2 已采納 2012-05-25 00:33:48

解決方案2
1 2012-05-25 02:16:56