简体   繁体   English

开发 Python package 时导入的最佳实践是什么?

[英]What is the best practice for imports when developing a Python package?

I am trying to build a Python package, that contains sub-modules and sub-packages ("libraries").我正在尝试构建一个 Python package,其中包含子模块和子包(“库”)。 I was looking everywhere for the right way to do it, but amazingly I find it very complicated.我到处都在寻找正确的方法,但令人惊讶的是我发现它非常复杂。 Also went through multiple threads in StackOverFlow of course..当然也经历了 StackOverFlow 中的多个线程..

The problem is as follows:问题如下:

  1. In order to import a module or a package from another directory, it seems to me that there are 2 options: a.为了从另一个目录导入模块或 package,在我看来有 2 个选项:a。 Adding the absolute path to sys.path .将绝对路径添加到sys.path b. b. Installing the package with the setuptools.setup function in a setup.py file, in the main directory of the package - which installs the package into the site-packages directory of the specific Python version that in use.使用setuptools.setup function 在 package 的主目录中的setup.py文件中安装 package - 将 package 安装到正在使用的特定 Python 版本的site-packages目录中。

  2. Option a seems too clumsy for me.选项a对我来说似乎太笨拙了。 Option b is great, however I find it impractical becasue I am currently working and editing the package's source code - and the changes are not updating on the installed directory of the package, of course.选项b很棒,但我发现它不切实际,因为我目前正在工作和编辑包的源代码 - 当然,更改不会在 package 的安装目录上更新。 In addition the installed directory of the package is not tracked by Git, and needless to say I use Git the original directory.另外package的安装目录没有被Git跟踪,不用说我用的是Git原目录。

To conclude the question: What is the best practice to import modules and sub-packages freely and nicely from within sub-directories of a Python package that is currently under construction?总结这个问题:从当前正在建设中的 Python package 的子目录中自由且良好地导入模块和子包的最佳实践是什么?

I feel I am missing something but couldn't find a decent solution so far.我觉得我遗漏了一些东西,但到目前为止找不到合适的解决方案。

Thanks!谢谢!

This is a great question, and I wish more people would think along these lines.这是一个很好的问题,我希望更多的人能按照这些思路思考。 Making a module importable and ultimately installable is absolutely necessary before it can be easily used by others.在其他人可以轻松使用之前,绝对有必要使模块可导入并最终可安装。

On sys.path munging在 sys.path 修改上

Before I answer I will say I do use sys.path munging when I do initial development on a file outside of an existing package structure.在我回答之前,我会说当我对现有 package 结构之外的文件进行初始开发时,我确实使用了 sys.path munging。 I have an editor snippet that constructs code like this:我有一个编辑器片段可以构造如下代码:

import sys, os
sys.path.append(os.path.expanduser('~/path/to/parent'))
from module_of_interest import *  # NOQA

Given the path to the current file I use:给定我使用的当前文件的路径:

import ubelt as ub
fpath = ub.Path('/home/username/path/to/parent/module_of_interest.py')
modpath, modname = ub.split_modpath(fpath, check=False)
modpath = ub.Path(modpath).shrinkuser()  # abstract home directory

To construct the necessary parts the snippet will insert into the file so I can interact with it from within IPython.为了构建必要的部分,代码片段将插入到文件中,这样我就可以在 IPython 中与它进行交互。 I find taking the little bit of extra time to remove the reference to my explicit homefolder such that the code still works as long as users have the same relative path structure wrt to the home directory makes this slightly more portable.我发现花了一些额外的时间来删除对我的显式主文件夹的引用,这样只要用户具有与主目录相同的相对路径结构,代码仍然可以工作,这使得它稍微更便携。

Proper Python Package Management妥善管理 Python Package

That being said, sys.path munging is not a sustainable solution.话虽这么说, sys.path munging 不是一个可持续的解决方案。 Ultimately you want your package to be managed by a python package manger.最终,您希望您的 package 由 python package 经理管理。 I know a lot of people use poetry, but I like plain old pip, so I can describe that process, but know this isn't the only way to do it.我知道很多人使用诗歌,但我喜欢普通的旧 pip,所以我可以描述这个过程,但知道这不是唯一的方法。

To do this we need to go over some basics.为此,我们需要了解一些基础知识 go。

Basics基本

  1. You must know what Python environment you are working in. Ideally this is a virtual environment managed with pyenv (or conda or mamba or poetry...).你必须知道你在什么 Python 环境中工作。理想情况下,这是一个用pyenv (或conda或 mamba 或 poetry ...)管理的虚拟环境。 But it's also possible to do this in your global sytem Python environment, although that is not recommended.但也可以在您的全局系统 Python 环境中执行此操作,但不建议这样做。 I like working in a single default Python virtual environment that is always activated in my.bashrc.我喜欢在 my.bashrc 中始终激活的单一默认 Python 虚拟环境中工作。 Its always easy to switch to a new one or blow it away / start fresh.它总是很容易切换到一个新的或吹走它/重新开始。

  2. You need to consider two root paths: the root of your repository, which I will call your repo path, and your root to your package, the package path or module path, which should be a folder with the name of the top-level Python package. You will use this name to import it.您需要考虑两个根路径:您的存储库的根目录,我将其称为您的 repo 路径,以及您的根目录到您的 package,package 路径或模块路径,它应该是一个名称为顶级 Python 的文件夹package。您将使用此名称导入它。 This package path must live inside the repo path.此 package 路径必须位于回购路径内。 Some repos, like xdoctest , like to put the module path in a src directory.一些 repos,比如xdoctest ,喜欢将模块路径放在src目录中。 Others, like ubelt , like to have the repo path at the top-level of the repository.其他人,如ubelt ,喜欢在存储库的顶层拥有 repo 路径。 I think the second case is conceptually easier for new package creators / maintainers, so let's go with that.我认为第二种情况对于新的 package 创建者/维护者来说在概念上更容易,所以让我们 go 吧。

Setting up the repo path设置回购路径

So now, you are in an activated Python virtual environment, and we have designated a path we will checkout the repo.所以现在,你处于一个激活的 Python 虚拟环境中,我们已经指定了一个路径,我们将检查 repo。 I like to clone repos in $HOME/code , so perhaps the repo path is $HOME/code/my_project .我喜欢在$HOME/code中克隆 repos,所以也许 repo 路径是$HOME/code/my_project

In this repo path you should have your root package path.在此回购路径中,您应该有根路径 package。 Lets say your package is named mypymod.假设您的 package 名为 mypymod。 Any directory that contains an __init__.py file is conceptually a python module, where the contents of __init__.py are what you get when you import that directory name.任何包含__init__.py文件的目录在概念上都是一个 python 模块,其中__init__.py的内容是您导入该目录名称时获得的内容。 The only difference between a directory module and a normal file module is that a directory module/package can have submodules or subpackages.目录模块和普通文件模块之间的唯一区别是目录模块/包可以有子模块或子包。

For example if you are in the my_project repo, ie when you ls you see mypymod , and you have a file structure that looks something like this...例如,如果你在my_project mypymod ls ,你有一个看起来像这样的文件结构......

+ my_project
    + mypymod
        + __init__.py
        + submod1.py
        + subpkg
            + __init__.py
            + submod2.py

you can import the following modules:您可以导入以下模块:

import mypymod
import mypymod.submod1
import mypymod.subpkg
import mypymod.subpkg.submod2

If you ensured that your current working directory was always the repo root, or you put the repo root into sys.path , then this would be all you need.如果您确保当前工作目录始终是存储库根目录,或者将存储库根目录放入sys.path ,那么这就是您所需要的。 Being visible in sys.path or the CWD is all that is needed for another module could see your module.sys.path或 CWD 中可见是另一个模块可以看到您的模块所需要的。

The package manifest: setup.py / pyproject.toml package 清单:setup.py / pyproject.toml

Now the trick is: how do you ensure your other packages / scripts can always see this module?现在的诀窍是:你如何确保你的其他包/脚本总能看到这个模块? That is where the package manager comes in. For this we will need a setup.py or the newer pyproject.toml variant.这就是 package 经理的用武之地。为此,我们需要一个setup.py或更新的pyproject.toml变体。 I'll describe the older setup.py way of doing things.我将描述旧的setup.py做事方式。

All you need to do is put the setup.py in your repo root .您需要做的就是将setup.py放在您的repo root中。 Note: it does not go in your package directory.注意:它不在您的 package 目录中的 go。 There are plenty of resources for how to write a setup.py so I wont describe it in much detail, but basically all you need is to populate it with enough information so it knows about the name of the package, its location, and its version.有很多关于如何编写 setup.py的资源,所以我不会详细描述它,但基本上您需要的只是用足够的信息填充它,以便它知道 package 的名称、它的位置和它的版本.

from setuptools import setup
setup(
    name='mypymod',
    version='0.1.0',
    packages=find_packages(include=['mypymod', 'mypymod.*']),
    install_requires=[],
)

So your package structure will look like this:所以你的 package 结构将如下所示:

+ my_project
    + setup.py
    + mypymod
        + __init__.py
        + submod1.py
        + subpkg
            + __init__.py
            + submod2.py

There are plenty of other things you can specify, I recommend looking at ubelt and xdoctest as examples.您可以指定很多其他内容,我建议您查看 ubelt 和 xdoctest 作为示例。 I'll note they contain a non-standard way of parsing requirements out of a requirements.txt or requirements/*.txt files, which I think is generally better than the standard way people handle requirements.我会注意到它们包含从requirements.txtrequirements/*.txt文件中解析需求的非标准方式,我认为这通常比人们处理需求的标准方式要好。 But I digress.但我离题了。

Given something that pip or some other package manager (eg pipx, poetry) recognizes as a package manifest - a file that describes the contents of your package, you can now install it.给定pip或其他一些 package 管理器(例如 pipx、poetry)识别为package 清单的内容 - 一个描述 package 内容的文件,您现在可以安装它。 If you are still developing it you can install it in editable mode, so instead of the package being copied into your site-packages, only a symbolic link is made, so any changes in your code are reflected each time you invoke Python (or immediately if you have autoreload on with IPython).如果您仍在开发它,您可以在可编辑模式下安装它,而不是将 package 复制到您的站点包中,只创建一个符号链接,因此每次调用 Python(或立即调用)时,代码中的任何更改都会反映出来如果你使用 IPython 自动重新加载)。

With pip it is as simple as running pip install -e <path-to-repo-root> , which is typically done by navigating into the repo and running pip install -e.使用 pip 就像运行pip install -e <path-to-repo-root>一样简单,这通常通过导航到 repo 并运行pip install -e. . .

Congrats, you now have a package you can reference from anywhere.恭喜,您现在有一个 package 可以从任何地方参考。

Making the most of your package充分利用您的 package

The python -m invocation python -m 调用

Now that you have a package you can reference as if it was installed via pip from pypi.现在你有了一个 package,你可以引用它,就好像它是通过 pip 从 pypi 安装的一样。 There are a few tricks for using it effectively.有一些技巧可以有效地使用它。 The first is running scripts.第一个是运行脚本。

You don't need to specify a path to a file to run it as a script in Python. It is possible to run a script as __main__ using only its module name.在 Python 中,您无需指定文件路径即可将其作为脚本运行。可以仅使用其模块名称将脚本作为__main__运行。 This is done with the -m argument to Python. For instance you can run python -m mypymod.submod1 which will invoke $HOME/code/my_project/mypymod/submod1.py as the main module (ie it's __name__ attribute will be set to "__main__" ).这是通过 Python 的-m参数完成的。例如,您可以运行python -m mypymod.submod1它将调用$HOME/code/my_project/mypymod/submod1.py作为主模块(即它的__name__属性将被设置为"__main__" )。

Furthermore if you want to do this with a directory module you can make a special file called __main__.py in that directory, and that is the script that will be executed.此外,如果您想使用目录模块执行此操作,您可以在该目录中创建一个名为__main__.py的特殊文件,这就是将要执行的脚本。 For instance if we modify our package structure例如,如果我们修改我们的 package 结构

+ my_project
    + setup.py
    + mypymod
        + __init__.py
        + __main__.py
        + submod1.py
        + subpkg
            + __init__.py
            + __main__.py
            + submod2.py

Now python -m mypymod will execute $HOME/code/my_project/mypymod/__main__.py and python -m mypymod.subpkg will execute $HOME/code/my_project/mypymod/subpkg/__main__.py .现在python -m mypymod将执行$HOME/code/my_project/mypymod/__main__.pypython -m mypymod.subpkg将执行$HOME/code/my_project/mypymod/subpkg/__main__.py This is a very handy way to make a module double as both a importable package and a command line executable (eg xdoctest does this).这是使模块兼作可导入 package 和命令行可执行文件(例如 xdoctest 执行此操作)的一种非常方便的方法。

Easier imports进口更容易

One pain point you might notice is that in the above code if you run:您可能会注意到的一个痛点是,在上面的代码中,如果您运行:

import mypymod
mypymod.submod1

You will get an error because by default a package doesn't know about its submodules until they are imported.你会得到一个错误,因为默认情况下 package 在导入之前不知道它的子模块。 You need to populate the __init__.py to expose any attributes you desire to be accessible at the top-level.您需要填充__init__.py以公开您希望在顶层访问的任何属性。 You could populate the mypymod/__init__.py with:您可以使用以下内容填充mypymod/__init__.py

from mypymod import submod1

And now the above code would work.现在上面的代码可以工作了。

This has a tradeoff though.不过这有一个折衷。 The more thing you make accessible immediately, the more time it takes to import the module, and with big packages it can get fairly cumbersome.您立即访问的东西越多,导入模块所需的时间就越多,而且对于大包来说,它会变得相当麻烦。 Also you have to manually write the code to expose what you want, so that is a pain if you want everything.此外,您还必须手动编写代码来公开您想要的内容,所以如果您想要一切,那将是一件痛苦的事情。

If you took a look at ubelt's init .py you will see it has a good deal of code to explicitly make every function in every submodule accessible at a top-level.如果你看一下ubelt 的init .py ,你会发现它有大量代码明确地使每个子模块中的每个 function 都可以在顶层访问。 I've written yet another library called mkinit that actually automates this process, and it also has the option of using the lazy_loader library to mitigate the performance impact of exposing all attributes at the top-level.我已经编写了另一个名为mkinit的库,它实际上自动执行了这个过程,它还可以选择使用lazy_loader库来减轻在顶层公开所有属性对性能的影响。 I find the mkinit tool very helpful when writing large nested packages.我发现 mkinit 工具在编写大型嵌套包时非常有用。

Summary概括

To summarize the above content:总结以上内容:

  1. Make sure you are working in a Python virtualenv (I recommend pyenv)确保你在 Python virtualenv 中工作(我推荐 pyenv)
  2. Identify your "package path" inside of your "repo path".在您的“回购路径”中识别您的“包路径”。
  3. Put an __init__.py in every directory you want to be a Python package or subpackage.在每个你想成为 Python package 或子包的目录中放一个__init__.py
  4. Optionally, use mkinit to autogenerate the content of your __init__.py files.或者,使用mkinit自动生成__init__.py文件的内容。
  5. Put a setup.py / pyproject.toml in the root of your "repo path".setup.py / pyproject.toml放在“回购路径”的根目录中。
  6. Use pip install -e.使用pip install -e. to install the package in editable mode while you develop it.在开发时以可编辑模式安装 package。
  7. Use python -m to invoke module names as scripts.使用python -m将模块名称作为脚本调用。

Hope this helps.希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM