简体繁体 English

CPython中C函数的魔术函数

[英]Magic functions to C functions in CPython

原文 2022-09-17 11:25:11 6 1 python/ compilation/ interpreter/ cpython

I am looking into Cpython implementation and got to learn about how python tackles operator overloading (for example comparison operators) using something like richcmpfunc tp_richcompare;我正在研究 Cpython 实现，并了解 python 如何使用richcmpfunc tp_richcompare 之类的方法处理运算符重载（例如比较运算符） richcmpfunc tp_richcompare; field in _typeobject struct. _typeobject结构中的字段。 Where the type is defined as typedef PyObject *(*richcmpfunc) (PyObject *, PyObject *, int);其中类型定义为typedef PyObject *(*richcmpfunc) (PyObject *, PyObject *, int); . . And so whenever there is need for PyObject being operated by these operators it tries to call tp_richcompare function.因此，每当需要由这些运算符操作PyObject时，它都会尝试调用tp_richcompare function。 My doubt is that in python we use magic functions like __gt__ etc. to override these operators.我怀疑在 python 中，我们使用__gt__等魔术函数来覆盖这些运算符。 So how does python code gets converted into C code as a tp_richcompare and is being used everywhere where we interpret any comparison operator for PyObject .那么 python 代码如何转换为 C 代码作为tp_richcompare并在我们解释PyObject的任何比较运算符的任何地方使用。

My second doubt is kind of general version of this: How code in a particular language (here Python) to override things (operators, hash etc.) which are interpreted in another language (C in case of CPython) calls the function defined in first language (Python).我的第二个疑问是这种通用版本：如何使用特定语言（此处为 Python）的代码来覆盖以另一种语言（CPython 的情况下为 C）解释的事物（运算符、hash 等）调用第一个定义的 function语言（Python）。 As far as I know, when bytecode is generated it's a low-level instruction based representation (which is essentially array of uint8_t ).据我所知，生成字节码时，它是基于低级指令的表示（本质上是uint8_t数组）。

Another example of this is __hash__ which would be defined in python but is needed in the C-based implementation of the dictionary while lookdict .另一个例子是__hash__ ，它将在 python 中定义，但在lookdict时基于 C 的字典实现中需要它。 Again they use C function typedef Py_hash_t (*hashfunc)(PyObject *);他们再次使用 C function typedef Py_hash_t (*hashfunc)(PyObject *); everywhere hash is needed for a PyObject but translation of __hash__ to this C function is mysterious. PyObject到处都需要 hash，但是将__hash__翻译成这个 C function 是神秘的。

1 个解决方案

Python code is not transformed into C code. Python 代码未转换为 C 代码。 It is interpreted by C code (in CPython), but that's a completely different concept.它由 C 代码（在 CPython 中）解释，但这是一个完全不同的概念。

There are many ways to interpret a Python program, and the language reference does not specify any particular mechanism.有很多方法可以解释 Python 程序，语言参考没有指定任何特定的机制。 CPython does it by transforming the each Python function into a list of virtual machine instructions, which can then be interpreted with a virtual machine emulator. CPython 通过将每个 Python function 转换为虚拟机指令列表来实现，然后可以使用虚拟机模拟器对其进行解释。 That's one approach.这是一种方法。 Another one would be to just build the AST and then define a (recursive) evaluate method on each AST node.另一种方法是只构建 AST，然后在每个 AST 节点上定义一个（递归） evaluate方法。

Of course, it would also be possible to transform the program into C code and compile the C code for future execution.当然，也可以将程序转换成 C 代码，编译 C 代码供以后执行。 (Here, "C" is not important. It could be any compiled language which seems convenient.) However, there's not much benefit to doing that, and lots of disadvantages. （在这里，“C”并不重要。它可以是任何看起来方便的编译语言。）但是，这样做并没有太多好处，而且有很多缺点。 One problem, which I guess is the one behind your question, is that Python types don't correspond to any C primitive type.我猜你的问题背后的一个问题是 Python 类型不对应于任何 C 原始类型。 The only way to represent a Python object in C is to use a structure, such as CPython PyObject , which is effectively a low-level mechanism for defining classes (a concept foreign to C) by including a pointer to a type object which contains a virtual method table, which contains pointers to the functions used to implement the various operations on objects of that type. The only way to represent a Python object in C is to use a structure, such as CPython PyObject , which is effectively a low-level mechanism for defining classes (a concept foreign to C) by including a pointer to a type object which contains a虚方法表，它包含指向用于实现对该类型对象的各种操作的函数的指针。 In effect, that will end up calling the same functions as the interpreter would call to implement each operation;实际上，这最终会调用与解释器调用相同的函数来实现每个操作； the only purpose of the compiled C code is to sequence the calls without having to walk through an interpretable structure (VM list or AST or whatever).编译后的 C 代码的唯一目的是对调用进行排序，而不必遍历可解释的结构（VM 列表或 AST 或其他）。 That might be slightly faster, since it avoids a switch statement on each AST node or VM operation, but it's also a lot bulkier, because a function call occupies a lot more space in memory than a single opcode byte.这可能会稍微快一些，因为它避免了在每个 AST 节点或 VM 操作上的switch语句，但它也更庞大，因为 function 调用在 memory 中比单个操作码字节占用更多的空间。

An intermediate possibility, in common use these days, is to dynamically compile descriptions of programs (ASTs or VM lists or whatever) into actual machine code at runtime, taking into account what can be discovered about the actual dynamic types and values of the referenced variables and functions.目前常用的一种中间可能性是在运行时将程序的描述（AST 或 VM 列表或其他）动态编译成实际的机器代码，同时考虑到可以发现的关于引用变量的实际动态类型和值的内容和功能。 That's called "just-in-time (JIT) compilation", and it can produce huge speedups at runtime, if it's implemented well.这称为“即时 (JIT) 编译”，如果实施得当，它可以在运行时产生巨大的加速。 On the other hand, it's very hard to get it right, and discussing how to do it is well beyond the scope of a SO answer.另一方面，很难做到正确，讨论如何做到这一点远远超出了 SO 答案的 scope。

As a postscript, I understand from a different question that you are reading Robert Nystrom's book, Crafting Interpreters .作为附言，我从另一个问题了解到您正在阅读 Robert Nystrom 的书Crafting Interpreters 。 That's probably a good way of learning these concepts, although I'm personally partial to a much older but still very current textbook, also freely available on the internet, The Structure and Interpretation of Computer Programs , by Gerald Sussman, Hal Abelson, and Julie Sussman.这可能是学习这些概念的好方法，尽管我个人偏爱一本更古老但仍然非常流行的教科书，该教科书也可以在互联网上免费获得，计算机程序的结构和解释，作者 Gerald Sussman、Hal Abelson 和 Julie苏斯曼。 The books are not really comparable, but both attempt to explain what it means to "interpret a program", and that's an extremely important concept, which probably cannot be communicated in four paragraphs (the size of this answer).这些书并没有真正的可比性，但都试图解释“解释程序”的含义，这是一个非常重要的概念，可能无法在四个段落中传达（这个答案的大小）。

Whichever textbook you use, it's important to not just read the words.无论您使用哪种教科书，重要的是不要只阅读单词。 You must do the exercises, which is the only way to actually understand the underlying concepts.你必须做练习，这是真正理解基本概念的唯一方法。 That's a lot more time-consuming, but it's also a lot more rewarding.这更耗时，但也更有价值。 One of the weaknesses of Nystrom's book (although I would still recommend it) is that it lays out a complete implementation for you. Nystrom 的书的一个弱点（尽管我仍然会推荐它）是它为您提供了一个完整的实现。 That's great if you understand the concepts and are looking for something which you can tweak into a rapid prototype, but it leaves open the temptation of skipping over the didactic material, which the is most important part for someone interested in learning how computer languages work.如果您了解这些概念并正在寻找可以调整为快速原型的东西，那就太好了，但它留下了跳过教学材料的诱惑，这对于有兴趣学习计算机语言如何工作的人来说是最重要的部分。