用cython加速python代碼

Question

我有一個函數，它基本上只是調用一個簡單的定義哈希函數，並測試它何時找到重復。 我需要用它做很多模擬，所以希望它盡可能快。 我試圖用cython來做這件事。 cython代碼當前使用普通的python整數列表調用，其值在0到m ^ 2的范圍內。

import math, random
cdef int a,b,c,d,m,pos,value, cyclelimit, nohashcalls   
def h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) %m    
def floyd(inputx):
    dupefound, nohashcalls = (0,0)
    m = len(inputx)
    loops = int(m*math.log(m))
    for loopno in xrange(loops):
        if (dupefound == 1):
            break
        a = random.randrange(m)
        b = random.randrange(m)
        c = random.randrange(m)
        d = random.randrange(m)
        pos = random.randrange(m)
        value = inputx[pos]
        listofpos = [0] * m
        listofpos[pos] = 1
        setofvalues = set([value])
        cyclelimit = int(math.sqrt(m))
        for j in xrange(cyclelimit):
            pos = h3(a,b, c,d, m, inputx[pos])
            nohashcalls += 1    
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls

如何將inputx和listofpos轉換為使用C類型數組並以C速度訪問數組？ 我還可以使用其他加速嗎？ 可以加快設定價值嗎？

因此，有一些東西需要比較，對m = 5000的floyd（）的50次調用目前在我的計算機上需要大約30秒。

更新：顯示如何調用floyd的示例代碼段。

m = 5000
inputx = random.sample(xrange(m**2), m)
(dupefound, nohashcalls) = edcython.floyd(inputx)

Answer 1

首先，似乎必須在函數內鍵入變量。 這方面就是一個很好的例子。

其次， cython -a ，對於“annotate”，給你一個非常好的分解cython編譯器生成的代碼和顏色編碼的指示臟的程度（讀：python api heavy）。 在嘗試優化任何內容時，此輸出非常重要。

第三，與Numpy合作的現在着名的頁面解釋了如何快速，C風格訪問Numpy陣列數據。 不幸的是，這是冗長而煩人的。 然而，我們很幸運，因為最近的Cython提供了Typed Memory Views ，它既易於使用又非常棒 。 在嘗試執行任何其他操作之前，請先閱讀整個頁面。

十分鍾左右后，我想出了這個：

# cython: infer_types=True

# Use the C math library to avoid Python overhead.
from libc cimport math
# For boundscheck below.
import cython
# We're lazy so we'll let Numpy handle our array memory management.
import numpy as np
# You would normally also import the Numpy pxd to get faster access to the Numpy
# API, but it requires some fancier compilation options so I'll leave it out for
# this demo.
# cimport numpy as np

import random

# This is a small function that doesn't need to be exposed to Python at all. Use
# `cdef` instead of `def` and inline it.
cdef inline int h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) % m

# If we want to live fast and dangerously, we tell cython not to check our array
# indices for IndexErrors. This means we CAN overrun our array and crash the
# program or screw up our stack. Use with caution. Profiling suggests that we
# aren't gaining anything in this case so I leave it on for safety.
# @cython.boundscheck(False)
# `cpdef` so that calling this function from another Cython (or C) function can
# skip the Python function call overhead, while still allowing us to use it from
# Python.
cpdef floyd(int[:] inputx):
    # Type the variables in the scope of the function.
    cdef int a,b,c,d, value, cyclelimit
    cdef unsigned int dupefound = 0
    cdef unsigned int nohashcalls = 0
    cdef unsigned int loopno, pos, j

    # `m` has type int because inputx is already a Cython memory view and
    # `infer-types` is on.
    m = inputx.shape[0]

    cdef unsigned int loops = int(m*math.log(m))

    # Again using the memory view, but letting Numpy allocate an array of zeros.
    cdef int[:] listofpos = np.zeros(m, dtype=np.int32)

    # Keep this random sampling out of the loop
    cdef int[:, :] randoms = np.random.randint(0, m, (loops, 5)).astype(np.int32)

    for loopno in range(loops):
        if (dupefound == 1):
            break

        # From our precomputed array
        a = randoms[loopno, 0]
        b = randoms[loopno, 1]
        c = randoms[loopno, 2]
        d = randoms[loopno, 3]
        pos = randoms[loopno, 4]

        value = inputx[pos]

        # Unforunately, Memory View does not support "vectorized" operations
        # like standard Numpy arrays. Otherwise we'd use listofpos *= 0 here.
        for j in range(m):
            listofpos[j] = 0

        listofpos[pos] = 1
        setofvalues = set((value,))
        cyclelimit = int(math.sqrt(m))
        for j in range(cyclelimit):
            pos = h3(a, b, c, d, m, inputx[pos])
            nohashcalls += 1
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls

這里沒有任何技巧沒有在docs.cython.org上解釋，這是我自己學習的地方，但有助於將它們全部融合在一起。

對原始代碼進行的最重要的更改是在注釋中，但它們都等於提供有關如何生成不使用Python API的代碼的Cython提示。

infer_types且不說：我真的不知道為什么infer_types默認不啟用。 它允許編譯器在可能的情況下隱式使用C類型而不是Python類型，這意味着更少的工作。

如果你在這上面運行cython -a ，你會看到調用Python的唯一行是你對random.sample的調用，以及構建或添加到Python set（）。

在我的機器上，原始代碼在2.1秒內運行。 我的版本在0.6秒內運行。

~~下一步是從該循環中獲取random.sample，但我會留給你。~~

我已經編輯了我的答案來演示如何預先計算蘭特樣本。 這使得時間縮短到0.4秒 。

Answer 2

您需要使用這種特定的散列算法嗎？ 為什么不對dicts使用內置的哈希算法？ 例如：

from collections import Counter
cnt = Counter(inputx)
dupes = [k for k, v in cnt.iteritems() if v > 1]

用cython加速python代碼

問題描述

2 個解決方案

解決方案1
10 已采納 2012-12-19 23:06:47

解決方案2
0 2012-12-19 20:50:41

用cython加速python代碼

問題描述

2 個解決方案

解決方案1 10 已采納 2012-12-19 23:06:47

解決方案2 0 2012-12-19 20:50:41

解決方案1
10 已采納 2012-12-19 23:06:47

解決方案2
0 2012-12-19 20:50:41