简体   繁体   English

避免 Python 中的模块命名空间污染

[英]Avoiding module namespace pollution in Python

TL;DR: What's the cleanest way to keep implementation details out of a module's namespace? TL;DR:将实现细节保留在模块命名空间之外的最简洁的方法是什么?

There are a number of similar questions on this topic already, but none seems to have a satisfactory answer relative to modern tools and language features.关于这个主题已经有许多类似的问题,但相对于现代工具和语言特性,似乎没有一个令人满意的答案。

I'm designing a Python package, and I'd like to keep each module's public interface clean, exposing only what's intended, keeping implementation details (especially imports) hidden.我正在设计一个 Python package,我想保持每个模块的公共接口干净,只公开预期的内容,隐藏实现细节(尤其是导入)。

Over the years, I've seen a number of techniques:多年来,我看到了许多技术:

Don't worry about it.别担心。 Just document how to use your package and let consumers of it just ignore the implementation details.只需记录如何使用您的 package 并让其消费者忽略实现细节。

This is just horrible, in my opinion.在我看来,这太可怕了。 A well-designed interface should be easily discoverable.一个设计良好的界面应该很容易被发现。 Having the implementation details publicly visible makes the interface much more confusing.公开可见的实现细节会使界面更加混乱。 Even as the author of a package, I don't want to use it when it exposes too much, as it makes autocompletion less useful.即使作为 package 的作者,我也不想在它暴露太多时使用它,因为它会降低自动补全的用处。

Add an underscore to the beginning of all implementation details.在所有实现细节的开头添加下划线。

This is a well-understood convention, and most development tools are smart enough to at least sort underscore-prefixed names to the bottom of autocomplete lists.这是一个很好理解的约定,大多数开发工具都足够聪明,至少可以将带下划线前缀的名称排序到自动完成列表的底部。 It works fine if you have a small number of names to treat this way, but as the number of names grows, it becomes more and more tedious and ugly.如果你有少量的名字以这种方式处理,它工作得很好,但随着名字的数量增加,它变得越来越乏味和丑陋。

Take for example this relatively simple list of imports:以这个相对简单的导入列表为例:

import struct

from abc    import abstractmethod, ABC
from enum   import Enum
from typing import BinaryIO, Dict, Iterator, List, Optional, Type, Union

Applying the underscore technique, this relatively small list of imports becomes this monstrosity:应用下划线技术,这个相对较小的导入列表变成了这个怪物:

import struct as _struct

from abc    import abstractmethod as _abstractmethod, ABC as _ABC
from enum   import Enum as _Enum
from typing import (
    BinaryIO as _BinaryIO,
    Dict     as _Dict,
    Iterator as _Iterator,
    List     as _List,
    Optional as _Optional,
    Type     as _Type,
    Union    as _Union
)

Now, I know this problem can be partially mitigated by never doing from imports, and just importing the entire package, and package-qualifying everything.现在,我知道这个问题可以通过from导入来部分缓解,只导入整个 package,并对所有内容进行封装。 While that does help this situation, and I realize that some people prefer to do this anyway, it doesn't eliminate the problem, and it's not my preference.虽然这确实有助于这种情况,而且我意识到有些人更喜欢这样做,但这并不能消除问题,这不是我的偏好。 There are some packages I prefer to import directly, but I usually prefer to import type names and decorators explicitly so that I can use them unqualified.有一些包我更喜欢直接导入,但我通常更喜欢显式导入类型名称和装饰器,这样我就可以无限制地使用它们。

There's an additional small problem with the underscore prefix.下划线前缀还有一个小问题。 Take the following publicly exposed class:拿下面公开的class为例:

class Widget(_ABC):
    @_abstractmethod
    def implement_me(self, input: _List[int]) -> _Dict[str, object]:
        ...

A consumer of this package implementing his own Widget implementation will see that he needs to implement the implement_me method, and it needs to take a _List and return a _Dict .这个 package 的消费者实现了他自己的Widget实现,他会看到他需要实现implement_me方法,它需要一个_List并返回一个_Dict Those aren't actual type names, and now the implementation-hiding mechanism has leaked into my public interface.这些不是实际的类型名称,现在实现隐藏机制已经泄漏到我的公共接口中。 It's not a big problem, but it does contribute to the ugliness of this solution.这不是一个大问题,但它确实导致了这个解决方案的丑陋。

Hide the implementation details inside a function.在 function 中隐藏实现细节。

This one's definitely hacky, and it doesn't play well with most development tools.这绝对是 hacky,它不能很好地与大多数开发工具配合使用。

Here's an example:这是一个例子:

def module():
    import struct

    from abc    import abstractmethod, ABC
    from typing import BinaryIO, Dict, List

    def fill_list(r: BinaryIO, count: int, lst: List[int]) -> None:
        while count > 16:
            lst.extend(struct.unpack("<16i", r.read(16 * 4)))
            count -= 16
        while count > 4:
            lst.extend(struct.unpack("<4i", r.read(4 * 4)))
            count -= 4
        for _ in range(count):
            lst.append(struct.unpack("<i", r.read(4))[0])

    def parse_ints(r: BinaryIO) -> List[int]:
        count = struct.unpack("<i", r.read(4))[0]
        rtn: List[int] = []
        fill_list(r, count, rtn)
        return rtn

    class Widget(ABC):
        @abstractmethod
        def implement_me(self, input: List[int]) -> Dict[str, object]:
            ...

    return (parse_ints, Widget)

parse_ints, Widget = module()
del module

This works, but it's super hacky, and I don't expect it to operate cleanly in all development environments.这行得通,但它超级hacky,我不希望它在所有开发环境中都能干净地运行。 ptpython , for example, fails to provide method signature information for the parse_ints function.例如, ptpython无法为parse_ints function 提供方法签名信息。 Also, the type of Widget becomes my_package.module.<locals>.Widget instead of my_package.Widget , which is weird and confusing to consumers.此外, Widget的类型变为my_package.module.<locals>.Widget而不是my_package.Widget ,这对消费者来说很奇怪且令人困惑。

Use __all__ .使用__all__

This is a commonly given solution to this problem: list the "public" members in the global __all__ variable:这是该问题的常用解决方案:列出全局__all__变量中的“公共”成员:

import struct

from abc    import abstractmethod, ABC
from typing import BinaryIO, Dict, List

__all__ = ["parse_ints", "Widget"]

def fill_list(r: BinaryIO, count: int, lst: List[int]) -> None:
    ...  # You've seen this.

def parse_ints(r: BinaryIO) -> List[int]:
    ...  # This, too.

class Widget(ABC):
    ...  # And this.

This looks nice and clean, but unfortunately, the only thing __all__ affects is what happens when you use wildcard imports from my_package import * , which most people don't do, anyway.这看起来很干净,但不幸的是, __all__影响的唯一一件事是当您使用from my_package import *的通配符导入时会发生什么,无论如何大多数人都不会这样做。

Convert the module to a subpackage, and expose the public interface in __init__.py .将模块转换为子包,并在__init__.py中公开公共接口。

This is what I'm currently doing, and it's pretty clean for most cases, but it can get ugly if I'm exposing multiple modules instead of flattening everything:这就是我目前正在做的事情,在大多数情况下它非常干净,但如果我公开多个模块而不是展平所有内容,它可能会变得丑陋:

my_package/
+--__init__.py
+--_widget.py
+--shapes/
   +--__init__.py
   +--circle/
   |  +--__init__.py
   |  +--_circle.py
   +--square/
   |  +--__init__.py
   |  +--_square.py
   +--triangle/
      +--__init__.py
      +--_triangle.py

Then my __init__.py files look kind of like this:然后我的__init__.py文件看起来像这样:

# my_package.__init__.py

from my_package._widget.py import parse_ints, Widget
# my_package.shapes.circle.__init__.py

from my_package.shapes.circle._circle.py import Circle, Sphere
# my_package.shapes.square.__init__.py

from my_package.shapes.square._square.py import Square, Cube
# my_package.shapes.triangle.__init__.py

from my_package.shapes.triangle._triangle.py import Triangle, Pyramid

This makes my interface clean, and works well with development tools, but it makes my directory structure pretty messy if my package isn't completely flat.这使我的界面干净,并且与开发工具配合得很好,但是如果我的 package 不完全平坦,它会使我的目录结构非常混乱。

Is there a better technique?有没有更好的技术?

Convert to subpackages to limit the number of classes in a place and to separate concerns.转换为子包以限制一个地方的类数量并分离关注点。 If a class or constant is not needed outside of its module prefix it with a double underscore.如果在其模块之外不需要类或常量,则使用双下划线作为前缀。 Import the module name if you do not want to explicitly import many classes from it.如果您不想从中显式导入许多类,请导入模块名称。 You have laid out all the solutions.你已经列出了所有的解决方案。

Not sure if this breaks anything, but one can do不确定这是否会破坏任何东西,但可以做到

"""Module Docstring"""

__all__ = [
    # Classes
    "Foo",
    # Functions
    "bar",
]
__ALL__ = dir() + __all__  # catch default module attributes.

# Imports go here

def __dir__() -> list[str]:
    return __ALL__

Explanation: dir(obj) tries to call obj.__dir__() .解释: dir(obj)尝试调用obj.__dir__() Modules are objects as well, and we can add a custom __dir__ method.模块也是对象,我们可以添加自定义__dir__方法。 Using this setup, you should get使用这个设置,你应该得到

dir(module) = ['__all__', '__builtins__', '__cached__', '__doc__', 
'__file__', '__loader__', '__name__', '__package__', '__spec__',]

Plus whatever is specified in __all__ .加上__all__中指定的任何内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM