简体   繁体   English

如何在Cython中键入生成器函数?

[英]How to type generator function in Cython?

If I have a generator function in Python, say: 如果我在Python中有一个生成器函数,请说:

def gen(x):
    for i in range(x):
        yield(i ** 2)

How do I declare that the output data type is int in Cython? 如何在Cython中声明输出数据类型是int Is it even worth while? 它是否值得一试?

Thanks. 谢谢。

Edit: I read mentions of (async) generators being implemented in the changelog: http://cython.readthedocs.io/en/latest/src/changes.html?highlight=generators#id23 编辑:我读过在更改日志中实现的(异步)生成器的提及: http//cython.readthedocs.io/en/latest/src/changes.html? highlight = generator#id23

However there is no documentation about how to use them. 但是没有关于如何使用它们的文档。 Is it because they are supported but there is no particular advantage in using them with Cython or no optimization possible? 是因为它们受到支持,但使用Cython或没有可能的优化没有特别的优势吗?

No, there is no way to do this in Cython. 不,在Cython中没有办法做到这一点。

When you look at the Cython-produced code, you will see that gen (and other generator-functions) returns a generator, which is basically a __pyx_CoroutineObject object, which looks as follows : 当您查看Cython生成的代码时,您将看到gen (和其他生成器函数)返回一个生成器,它基本上是一个__pyx_CoroutineObject对象, 如下所示

typedef PyObject *(*__pyx_coroutine_body_t)(PyObject *, PyThreadState *, PyObject *);
typedef struct {
    PyObject_HEAD
    __pyx_coroutine_body_t body;
    PyObject *closure;
    ...
    int resume_label;
    char is_running;
} __pyx_CoroutineObject;

The most important part is the body -member: this is the function which does the actual calculation. 最重要的部分是body成员:这是进行实际计算的功能。 As we can see it returns a PyObject and there is no way (yet?) for it to be adapted to int , double or similar. 正如我们所看到的,它返回一个PyObject并且没有办法(但是?)它可以适应intdouble或类似的东西。

As for the reasons why it is not done, I can only speculate - but there are probably more than just one reason. 至于为什么没有这样做的原因,我只能推测 - 但可能不止一个原因。

If you really care about performance, generators introduce too much overhead anyway (for example yield is not possible in cdef -functions) and should be refactored into something simpler. 如果你真的关心性能,那么生成器无论如何都会引入太多的开销(例如在cdef -functions中不可能yield )并且应该重构为更简单的东西。


To elaborate more about possible refactorings. 详细说明可能的重构。 As baseline let's assume we would like to sum up all created values: 作为基线,我们假设我们想要总结所有创建的值:

%%cython 
def gen(int x):
    cdef int i
    for i in range(x):
        yield(i ** 2)

def sum_it(int n):
    cdef int i
    cdef int res=0
    for i in gen(n):
        res+=i
    return res

Timing it leads to: 时间安排导致:

>>> %timeit sum_it(1000)
28.9 µs ± 1.06 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The good news: it is about 10 times faster than the pure python version, but if we are really after the speed: 好消息:它比纯python版快10倍,但如果我们真的在速度之后:

%%cython 
cdef int gen_fast(int i):
    return i ** 2

def sum_it_fast(int n):
    cdef int i
    cdef int res=0
    for i in range(n):
        res+=gen_fast(i)
    return res

It is: 它是:

>>> %timeit sum_it_fast(1000)
661 ns ± 20.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

about 50 times faster. 大约快50倍。

I understand, that is quite a change and might be pretty hard to do - I would do it only if it is really the bottle-neck of my program - but then speed-up 50 would be a real motivation to do it. 我明白,这是一个相当大的改变,可能很难做到 - 只有当它真的是我的计划的瓶颈时才会这样做 - 但是加速50将是一个真正的动力去做。

Obviously there are a lot of others approaches: using numpy-arrays or array.array instead of generators or writing a custom generator (cdef-class) which would offer an additional fast/efficient possibility to get the int -values and not PyObjects - but this all depends on your scenario at hand. 显然还有很多其他方法:使用numpy-arrays或array.array而不是生成器或编写自定义生成器(cdef-class),这将提供额外的快速/有效的可能性来获取int而不是PyObjects - 但是这一切都取决于您手头的情况。 I just wanted to show that there is potential to improve the performance by ditching the generators. 我只想表明有可能通过抛弃发电机来提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM