简体   繁体   English

Python 模块是否已编译?

[英]Are Python modules compiled?

Trying to understand whether python libraries are compiled because I want to know if the interpreted code I write will perform the same or worse.试图了解是否编译了 python 库,因为我想知道我编写的解释代码是否会执行相同或更差的性能。

eg I saw it mentioned somewhere that numpy and scipy are efficient because they are compiled.例如,我在某处看到它提到 numpy 和 scipy 是有效的,因为它们是编译的。 I don't think this means byte code compiled so how was this done?我不认为这意味着编译了字节码,这是如何完成的? Was it compiled to c using something like cython?它是使用 cython 之类的东西编译成 c 的吗? Or was it written using a language like c and compiled in a compatible way?或者它是使用像 c 这样的语言编写并以兼容的方式编译的吗?

Does this apply to all modules or is it on a case-by-case basis?这适用于所有模块还是逐案处理?

NumPy and several other libraries are partly wrappers for code written in C and other languages like FORTRAN, which when compiled will run faster than Python. NumPy和其他几个库是用C语言和其他语言(如FORTRAN)编写的代码的部分包装器,它们在编译时运行速度比Python快。 This helps by avoiding the cost of loops, pointer indirection and per-element dynamic type checking in Python. 这有助于避免Python中的循环,指针间接和每元素动态类型检查的成本。 This is explained in this question : 这个问题这个问题中解释:

Numpy arrays are densely packed arrays of homogeneous type. Numpy数组是密集的同类型数组。 Python lists, by contrast, are arrays of pointers to objects, even when all of them are of the same type. 相比之下,Python列表是指向对象的指针数组,即使它们都属于同一类型。 So, you get the benefits of locality of reference. 因此,您可以获得参考地点的好处。

Also, many Numpy operations are implemented in C, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. 此外,许多Numpy操作在C中实现,避免了Python中的循环,指针间接和每元素动态类型检查的一般成本。 The speed boost depends on which operations you're performing, but a few orders of magnitude isn't uncommon in number crunching programs. 速度提升取决于您正在执行的操作,但在数字运算程序中,几个数量级并不罕见。

Python code that is compiled to bytecode (.pyc files) is a separate topic, in which python scripts are compiled to increase startup performance (see this question ). 编译为字节码(.pyc文件)的Python代码是一个单独的主题,其中编译python脚本以提高启动性能(请参阅此问题 )。

Python can execute functions written in Python (interpreted) and compiled functions. Python可以执行用Python(解释)和编译函数编写的函数。 There are whole API docs about writing code for integration with Python. 有关于编写与Python集成的代码的完整API文档。 cython is one of the easier tools for doing this. cython是执行此操作的更简单的工具之一。

Libraries can be any combination - pure Python, Python plus interfaces to compiled code, or all compiled. 库可以是任何组合 - 纯Python,Python加编译代码的接口,或全部编译。 The interpreted files end with .py , the compiled stuff usually is .so or .dll (depending on the operating system). 解释的文件以.py结尾,编译的东西通常是.so.dll (取决于操作系统)。 It's easy to install pure Python code - just load, unzip if needed, and put the right directory. 安装纯Python代码很容易 - 只需加载,如果需要解压缩,并放入正确的目录。 Mixed code requires a compilation step (and hence ac compiler, etc), or downloading a version with binaries. 混合代码需要编译步骤(因此需要编译器等),或者下载带有二进制文件的版本。

Typically developers get the code working in Python, and then rewrite speed sensitive portions in c . 通常,开发人员使用Python编写代码,然后在c重写速度敏感部分。 Or they find some external library of working c or Fortran code, and link to that. 或者他们找到一些工作cFortran代码的外部库,并链接到它。

numpy and scipy are mixed. numpyscipy是混合的。 They have lots of Python code, core compiled portions, and use external libraries. 他们有许多Python代码,核心编译部分,并使用外部库。 And the c code can be extraordinarily hard to read. 并且c代码可能非常难以阅读。

As a numpy user, you should first try to get as much clarity and performance with Python code. 作为一个numpy用户,您应该首先尝试使用Python代码获得尽可能多的清晰度和性能。 Most of the optimization SO questions discuss ways of making use of the compiled functionality of numpy - all the operations that work on whole arrays. 大多数SO优化问题都讨论了如何利用numpy的编译功能 - 所有操作都适用于整个数组。 It's only when you can't express your operations in efficient numpy code that you need to resort to using a tool like cython or numba . 只有当你无法用高效的numpy代码表达你的操作时,才需要使用像cythonnumba这样的工具。

In general if you have to iterate extensively then you are using low level operations. 通常,如果您必须进行广泛的迭代,那么您正在使用低级操作。 Either replace the loops with array operations, or rewrite the loop in cython. 要么用数组操作替换循环,要么用cython重写循环。

Low-Level Compiled Languages and Performance低级编译语言和性能

The answers by @hpaulj and @jeevcat are correct. @hpaulj 和 @jeevcat 的答案是正确的。

But the story of whether Python is compiled is more complex.但是 Python 是否编译的故事更为复杂。

First, it is true that well written code in C++ is far faster than well written Python code.首先,C++编写好的代码确实比编写好的Python 代码快得多。 And that compiled code generally allows for faster calculations.并且编译后的代码通常允许更快的计算。

But the reason is not because the code is compiled, per se .但原因不是因为代码本身被编译。 It's because these compiled languages are typically also lower level languages that let you manipulate memory directly, avoid garbage collection, etc. Moreover, to allow for Python dynamicism and simplicity , everything is an object.这是因为这些编译语言通常也是低级语言,可以让您直接操作内存,避免垃圾收集等。此外,为了允许 Python动态性简单性,一切都是对象。 So a Python list, for instance, is an object with a list of references to other objects "scattered" throughout memory.因此,例如,Python 列表是一个对象,其中包含对“分散”在整个内存中的其他对象的引用列表。 This is (obviously) less computationally efficient than a memory block with all values in the list next to each other.这(显然)比列表中所有值彼此相邻的内存块的计算效率低。

And, as the others mentioned, the Python code just calls (talks to) this other, more efficient C code.而且,正如其他人提到的,Python 代码只是调用(与之对话)另一个更高效的 C 代码。

Is Python Compiled? Python编译了吗?

But there is a more interesting question.但还有一个更有趣的问题。 Is Python compiled or not? Python 是否编译? A few people may unwittingly claim that it is not compiled.一些人可能会在不知不觉中声称它没有被编译。 This is not strictly true.这并不严格。 Any time you import a package or module, it will invisibly be compiled and saved if it has not already been compiled.任何时候你导入一个包或模块,如果它没有被编译,它就会被无形地编译和保存 (You will likely not even notice any compilation happening.) (您甚至可能不会注意到任何编译发生。)

You can see this happen: any .pyc file (a file ending in .pyc instead of .py ) is a compiled Python file.您可以看到这种情况发生:任何.pyc文件(以.pyc而不是.py结尾的文件)都是已编译的 Python 文件。 Try to open a .pyc file in an editor or via cat .尝试在编辑器中或通过cat打开.pyc文件。 You'll see that it is a binary file and will look like gibberish.你会看到它是一个二进制文件,看起来像胡言乱语。

Looking at the Invisible Creation of Compiled Python Code看编译后的 Python 代码的无形创建

How to create compiled Python code?如何创建已编译的 Python 代码?

Let's say that you have the following folder structure:假设您有以下文件夹结构:

❯ tree -L 1
.
├── __pypackages__ # This is a folder, the rest are files
├── addressbook.proto
├── addressbook_pb2.py
├── pdm.lock
├── protobuf-python-3.17.3.tar.gz
├── pyproject.toml
└── readme.txt

(The above structure above contains the Python Google Protocol Buffer example, using the modern PDM package manager structure.) (上面的结构包含 Python Google Protocol Buffer示例,使用现代PDM 包管理器结构。)

We can see that the only Python module (file) is addressbook_pb2 .我们可以看到唯一的 Python 模块(文件)是addressbook_pb2 So, let's import that file:因此,让我们导入该文件:

❯ python
Python 3.9.7 (default, Oct 13 2021, 06:45:31) 
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import addressbook_pb2
>>>   [exit out of Python]
❯ 

I did nothing except for quickly import the file (module) addressbook_pb2.py .除了快速导入文件(模块) addressbook_pb2.py之外,我什么也没做。 But just that simple import created an entire "compiled code folder" called __pycache__ with the compiled module in it:但只是那个简单的导入就创建了一个名为__pycache__的完整“编译代码文件夹”, __pycache__包含编译模块:

❯ tree -L 1
.
├── __pypackages__
├── __pycache__ # This is the folder that was auto-generated
├── addressbook.proto
├── addressbook_pb2.py
├── pdm.lock
├── protobuf-python-3.17.3.tar.gz
├── pyproject.toml
└── readme.txt

Now we'll look to see what is in that __pycache__ folder:现在我们将查看__pycache__文件夹中的内容:

❯ ll __pycache__ # `ll` is my shortcut for `ls -al`, it's a common shortcut
total 8
drwxr-xr-x   3 mikewilliamson  staff    96B Oct 30 21:43 .
drwxr-xr-x  34 mikewilliamson  staff   1.1K Oct 30 21:43 ..
-rw-r--r--   1 mikewilliamson  staff   3.2K Oct 30 21:43 addressbook_pb2.cpython-39.pyc
❯ 

Notice that the file addressbook_pb2.cpython-39.pyc is in there.请注意,文件addressbook_pb2.cpython-39.pyc在那里。 The stem is the name of the module ( addressbook_pb2 ).词干是模块的名称( addressbook_pb2 )。 But it also has the .cpython-39.pyc extension.但它也有.cpython-39.pyc扩展名。 This tells us a few things:这告诉我们一些事情:

  1. It is compiled code... that's what the .pyc on the end means它是编译后的代码……这就是最后的.pyc的意思
  2. It is compiled using cpython-39 , meaning that it is the CPython "flavor" of Python (the most ubiquitous), version 3.9.它是使用cpython-39编译的,这意味着它是 Python(最普遍的)3.9 版的CPython “风味”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python中已编译模块之间的依赖关系 - dependencies between compiled modules in python Python 3.11 对 Cython 编译模块的影响 - Effect of Python 3.11 on Cython-compiled modules 使用模块编译的Python代码的版本兼容性 - Compatibility of versions for compiled Python code using modules 为什么一些Python模块必须“编译”? - Why do some Python modules have to be “compiled”? 如何从内存加载已编译的python模块? - How to load compiled python modules from memory? Python - 将 python 模块添加到由 Windows 上的 py 安装程序编译的 .exe 文件 - Python - Adding python modules to .exe file compiled by py Installer on Windows 为什么在Windows上需要使用MSVC编译Python扩展模块? - Why do Python extension modules need to be compiled with MSVC on Windows? 如何找到已编译的Python模块的平台/版本字符串? - How to find platform/version string for compiled Python modules? Python模块是否必须使用与主内核相同的版本进行编译? - Do Python modules necessarily have to be compiled with the same version as the main core? 为什么主要的可运行Python脚本没有编译成模块这样的pyc文件? - Why are main runnable Python scripts not compiled to pyc files like modules?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM