简体   繁体   English

为什么 cython embedded plugins 在 cpython 解释器中比 rust-c 接口版本有更高的性能?

[英]Why cython embeded plugins has higher performance in cpython interpreter than rust-c interface versions?

I would like to ask some questions about the underlying principles of python interpreters, because I didn't get much useful information during my own search.想请教一些关于python解释器底层原理的问题,因为自己搜索的时候没有得到太多有用的信息。

I've been using rust to write python plugins lately, this gives a significant speedup to python's cpu-intensive tasks, and it's also faster to write comparing to c. However it has one disadvantage is that, compared to the old scheme of using cython to accelerate, the call overhead of rust (I'm using pyo3) seems to be greater than that of c(I'm using cython),我最近一直在使用 rust 编写 python 插件,这大大加快了 python 的 cpu 密集型任务,并且与 c 相比,它的编写速度也更快。但是它有一个缺点是,与使用 cython 的旧方案相比为了加速,rust(我使用的是pyo3)的调用开销似乎大于c(我使用的是cython),

For example, we got an empty python function here:例如,我们在这里得到一个空的 python function:

def empty_function():
    return 0

Call it a million times over in Python via a for loop and count the time, so that we can find out each single call takes about 70 nanosecond(in my pc).通过 for 循环在 Python 中调用它一百万次并计算时间,这样我们就可以发现每次调用大约需要 70 纳秒(在我的电脑中)。

And if we compile it to a cython plugin, with the same source code:如果我们将它编译成一个 cython 插件,使用相同的源代码:

# test.pyx
cpdef unsigned int empty_function():
    return 0

The execution time will be reduced to 40 nanoseconds.执行时间将减少到 40 纳秒。 Which means that we can use cython for some fine-grained embedding, and we can expect it to always execute faster than native python.这意味着我们可以使用 cython 进行一些细粒度的嵌入,我们可以期望它总是比原生 python 执行得更快。

However when it comes to Rust, (Honesty speaking, I prefer to use rust for plugin development rather than cython now cause there's no need to do some weird hacking in grammar), the call time will increase to 140 nanoseconds, almost twice as much as native python. Source code as follow:然而,当涉及到 Rust 时,(老实说,我现在更喜欢使用 rust 进行插件开发而不是 cython,因为不需要在语法上做一些奇怪的黑客攻击),调用时间将增加到 140 纳秒,几乎是本机python,源码如下:

use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

#[pyfunction]
fn empty_function() -> usize {
    0
}

#[pymodule]
fn testlib(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(empty_function, m)?)?;
    Ok(())
}

This means that rust is not suitable for fine-grained embedded replacement of python. If there is a task whose call time is very few and each call takes a long time, then it is perfect to use rust. However if there's a task will be called a lot in the code, then it seems not suitable for rust, cause the overhead of type conversion will take up most of the accelerated time.这意味着rust不适合python的细粒度嵌入式替换。如果有一个任务调用时间很少,每次调用时间很长,那么使用rust是完美的。但是如果有一个任务会代码中调用了很多,那么rust好像不太合适,因为类型转换的开销会占用大部分加速时间。

I want to know if this can be solved and, more importantly, I want to know the underlying rationale for this discrepancy.我想知道这是否可以解决,更重要的是,我想知道这种差异的根本原因。 Is there some kind of difference with the cpython interpreter when calling between them, like the difference between cpython and pypy when calling c plugins?在它们之间调用时,cpython 解释器是否存在某种差异,例如调用 c 插件时 cpython 和 pypy 之间的差异? Where can I get further information?我在哪里可以获得更多信息? Thanks.谢谢。

=== ===

Update:更新:

Sorry guys, I didn't anticipate that my question would be ambiguous, after all, the source code for all three has been given, and using timeit to test function runtimes is an almost convention in python development. sorry各位,没想到我的问题会这么模棱两可,毕竟三者的源码都给了,用timeit测试function运行时几乎是python开发的约定俗成。

My test code is nearly all the same with @Jmb 's code in comment, with some subtle differences that I'm using python setup.py build_ext --inplace way to build instead of bare gcc, but that should not make any difference.我的测试代码几乎与@Jmb 在评论中的代码完全相同,有一些细微的差别,我使用python setup.py build_ext --inplace方式来构建而不是裸露的 gcc,但这应该没有任何区别。 Anyway, thanks for supplementary.总之谢谢补充。

It's also worth noting here that compiling rust extensions with python setup.py build_ext --inplace builds them in unoptimised mode (same goes for python setup.py develop or pip install -e. ).这里还值得注意的是,使用python setup.py build_ext --inplace编译 rust 扩展时会以未优化模式构建它们(同样适用于python setup.py developpip install -e. )。

Here's an excerpt from the output of:以下是 output 的摘录:

Finished dev [unoptimized + debuginfo] target(s) in 0.02s

To build in "release" mode with an optimised binary, use:要使用优化的二进制文件以“发布”模式构建,请使用:

pip install .

With pip install. --verbose使用pip install. --verbose pip install. --verbose you can see the difference: pip install. --verbose你可以看到区别:

Finished release [optimized] target(s) in 1.02s

This can make a massive difference, in my case the unoptimised build is 9x slower than the optimised build.这可以产生巨大的差异,在我的例子中,未优化的构建比优化的构建慢 9 倍。

As suggested in the comments, this is a self-answer.正如评论中所建议的,这是一个自我回答。

Since the discussion in the comments section did not lead to a clear conclusion, I went to raise an issue in pyo3's repo and get response from whose main maintainer.由于评论部分的讨论没有得出明确的结论,所以我去 pyo3 的 repo 中提出了一个问题,并得到了其主要维护者的回应。

In short, the conclusion is that there is no fundamental difference between the plugins compiled by pyo3 or cython when cpython calling them.总之,结论是pyo3和cython编译的插件在cpython调用时没有本质区别。 The current speed difference comes from the different depth of optimization.当前的速度差异来自于优化深度的不同。

Here is the link to the issue: https://github.com/PyO3/pyo3/issues/1470这是问题的链接: https://github.com/PyO3/pyo3/issues/1470

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM