为什么在速度方面，cython循环的性能与python一相比降低了？

Question

I am trying to improve my python code in terms of speed by using cython features. 我正在尝试通过使用cython功能来提高python代码的速度。 My python code consists of py_child and py_parent classes and py_backup function which is like this: 我的python代码由py_child和py_parent类以及py_backup函数组成，如下所示：

import random
from time import clock
import numpy as np
from libc.string cimport memcmp
## python code #################################################
class py_child:
    def __init__(self, move):
        self.move = move
        self.Q = 0
        self.N = 0

class py_parent:
    def __init__(self):
        self.children = []
    def add_children(self, moves):
        for move in moves:
            self.children.append(py_child(move))

def py_backup(parent, white_rave, black_rave):
    for point in white_rave:
        for ch in parent.children:
            if ch.move == point:
                ch.Q += 1
                ch.N += 1

    for point in black_rave:
        for ch in parent.children:
            if ch.move == point:
                ch.Q += 1
                ch.N += 1

and this is the same implementation in cython by using memoryviews as some variables: 这与cython的实现相同，通过使用memoryviews作为一些变量：

## cython ######################################################

cdef class cy_child:
    cdef public:
        int[:] move
        int Q
        int N
    def __init__(self, move):
        self.move = move
        self.Q = 0
        self.N = 0

cdef class cy_parent:
    cdef public:
        list children
        int[:, :] moves
    def __init__(self):
        self.children = []
    def add_children(self, moves):
        cdef int i = 0
        cdef int N = len(moves)
        for i in range(N):
            self.children.append(cy_child(moves[i]))

cpdef cy_backup(cy_parent parent_node, int[:, :] white_rave,int[:, :] black_rave):
    cdef int[:] move
    cdef cy_child ch
    for move in white_rave:
        for ch in parent_node.children:
            if memcmp(&move[0], &ch.move[0], move.nbytes) == 0:
                ch.Q += 1
                ch.N += 1

    for move in black_rave:
        for ch in parent_node.children:
            if memcmp(&move[0], &ch.move[0], move.nbytes) == 0:
                ch.Q += 1
                ch.N += 1

Now I want to evaluate the speed of code for functions cy_backup, py_backup.So I use this code: 现在我想评估函数cy_backup，py_backup的代码速度，因此我使用以下代码：

### Setup variables #########################################
size = 11
board = np.random.randint(2, size=(size, size), dtype=np.int32)

for x in range(board.shape[0]):
    for y in range(board.shape[1]):
        if board[x,y] == 0:
            black_rave.append((x,y))
        else:
            white_rave.append((x,y))

py_temp = []
for i in range(size):
    for j in range(size):
        py_temp.append((i,j))

#### python arguments #######################################

py = py_parent()
py.add_children(py_temp)
# also py_temp, black_rave, white_rave

#### cython arguments #######################################
cy_temp = np.assarray(py_temp, , dtype= np.int32)
cy_black_rave = np.asarray(black_rave, dtype= np.int32)
cy_white_rave = np.asarray(white_rave, dtype= np.int32)
cy = cy_parent()
cy.add_children(cy_temp)

#### Speed test #################################################
%timeit py_backup(py_parent, black_rave, white_rave)
%timeit cy_backup(cy_parent, cy_black_rave, cy_white_rave)

when I ran the program, I was surprised by the results: 当我运行程序时，我对结果感到惊讶：

1000 loops, best of 3: 759 µs per loop
100 loops, best of 3: 6.38 ms per loop

I was expecting cython to be much more faster than python specially when memoryviews are used. 我期望cython比python快得多，特别是在使用memoryviews时。
Why the loop in cython runs slower than loop in python? 为什么cython中的循环比python中的循环慢？
It would be highly appreciated if anyone has any suggestion to speed up the code in cython. 如果有人对加速cython中的代码有任何建议，将不胜感激。
In advance I apologize for my question including too much code. 事先我为我的问题（包括太多代码）道歉。

Answer 1

Cython memoryviews are really only optimised for one thing which is accessing single elements or slices (usually in a loop) Cython内存视图实际上仅针对访问单个元素或片（通常在循环中）的一件事进行了优化。

# e.g.
cdef int i
cdef int[:] mview = # something
for i in range(mview.shape[0]):
   mview[i] # do some work with this....

This type of code can be converted directly into efficient C code. 这种类型的代码可以直接转换为高效的C代码。 For pretty much any other operation the memoryview is treated as a Python object. 对于几乎所有其他操作，将memoryview视为Python对象。

Unfortunately almost none of your code takes advantage of the one thing memoryviews are good at, so you get no real speed up. 不幸的是，几乎所有代码都没有利用memoryview擅长的一件事，因此您无法获得真正的加速。 Instead it's actually worse because you've added an extra layer, and a whole load of small length 2 memoryviews is going to be very bad. 相反，实际上情况更糟，因为您已经添加了额外的一层，并且小长度2个memoryview的整个负载将非常糟糕。

My advice is really just to use lists - they're actually pretty good for this kind of thing and it isn't at all clear to me how to rewrite your code to really speed it up with Cython. 我的建议实际上只是使用列表-它们实际上对这种事情非常有用，而且我还不清楚如何重写您的代码以真正使用Cython加快速度。

Some small optimizations I've spotted: You can get a pretty good idea of how optimised Cython is by looking at the highlighted html file generated by cython -a . 我发现了一些小的优化：通过查看cython -a生成的突出显示的html文件，您可以很好地了解如何优化Cython。 You'll see that general iteration of a memoryview is slow (ie pure Python). 您会看到memoryview的一般迭代很慢（即纯Python）。 You get an improvement by changing 通过改变你会得到改善

# instead of:
# for move in white_rave:
for i in range(white_rave.shape[0]):
    move = white_rave[i,:]

This lets Cython iterate the memoryview in an efficient way. 这使Cython可以有效地迭代memoryview。

You can get a bit more speed by turning off some of the safety checks for the memcmp line: 通过关闭memcmp行的一些安全检查，可以提高速度：

with cython.boundscheck(False), cython.initializedcheck(False):
   if memcmp(&move[0], &ch.move[0], move.nbytes) == 0:

(you need to cimport cython ). （您需要cimport cython ）。 If you do this and you haven't initialized ch.move or both memoryviews doesn't have at least one element then your program may crash. 如果执行此操作，但尚未初始化ch.move或两个memoryviews没有至少一个元素，则您的程序可能会崩溃。

I realise this isn't a helpful answer, but so long as you want to keep child as a Python class (event a cdef one) there really isn't much you can do to speed it up. 我意识到这不是一个有用的答案，但是只要您希望将child保留为Python类（事件是cdef ），就真的没有什么可以加快它的方法了。 You might consider changing it to a C struct (which you could have a C array of) but then you lose all the benefits of working with Python (ie you have to manage your own memory and you can't access it easily from Python code). 您可能会考虑将其更改为C结构（可以具有C数组），但是随后您失去了使用Python的所有好处（即，您必须管理自己的内存，并且无法从Python代码轻松访问它））。

为什么在速度方面，cython循环的性能与python一相比降低了？

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-08-07 18:50:17

为什么在速度方面，cython循环的性能与python一相比降低了？

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-08-07 18:50:17

解决方案1
3 已采纳 2017-08-07 18:50:17