简体   繁体   English

如何使数据类更好地与 __slots__ 配合使用?

[英]How can dataclasses be made to work better with __slots__?

It was decided to remove direct support for __slots__ from dataclasses for Python 3.7.决定从 Python 3.7 的数据类中删除对__slots__直接支持。

Despite this, __slots__ can still be used with dataclasses:尽管如此, __slots__仍然可以与数据类一起使用:

from dataclasses import dataclass

@dataclass
class C():
    __slots__ = "x"
    x: int

However, because of the way __slots__ works it isn't possible to assign a default value to a dataclass field:但是,由于__slots__工作方式,无法为数据类字段分配默认值:

from dataclasses import dataclass

@dataclass
class C():
    __slots__ = "x"
    x: int = 1

This results in an error:这会导致错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable

How can __slots__ and default dataclass fields be made to work together?如何使__slots__和默认dataclass字段一起工作?

2021 UPDATE: direct support for __slots__ is added to python 3.10. 2021 更新:对__slots__直接支持已添加到 python 3.10。 I am leaving this answer for posterity and won't be updating it.我将这个答案留给后人,不会更新。

The problem is not unique to dataclasses.这个问题并不是数据类独有的。 ANY conflicting class attribute will stomp all over a slot:任何冲突的类属性都会在一个插槽上踩踏:

>>> class Failure:
...     __slots__ = tuple("xyz")
...     x=1
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable

This is simply how slots work.这就是插槽的工作方式。 The error happens because __slots__ creates a class-level descriptor object for each slot name:发生错误是因为__slots__为每个插槽名称创建了一个类级描述符对象:

>>> class Success:
...     __slots__ = tuple("xyz")
...
>>>
>>> type(Success.x)
<class 'member_descriptor'>

In order to prevent this conflicting variable name error, the class namespace must be altered before the class object is instantiated such that there are not two objects competing for the same member name in the class:为了防止这个变量名冲突的错误,在类对象被实例化之前必须改变类命名空间,这样在类中不会有两个对象竞争同一个成员名:

  • the specified (default) value*指定的(默认)值*
  • the slot descriptor (created by the slots machinery)插槽描述符(由插槽机制创建)

For this reason, an __init_subclass__ method on a parent class will not be sufficient, nor will a class decorator, because in both cases the class object has already been created by the time these functions have received the class to alter it.出于这个原因,父类上的__init_subclass__方法是不够的,类装饰器也是不够的,因为在这两种情况下,当这些函数接收到类来改变它时,类对象已经被创建。

Current option: write a metaclass当前选项:编写元类

Until such time as the slots machinery is altered to allow more flexibility, or the language itself provides an opportunity to alter the class namespace before the class object is instantiated, our only choice is to use a metaclass.在更改槽机制以提供更大的灵活性之前,或者语言本身提供了在类对象实例化之前更改类命名空间的机会之前,我们唯一的选择是使用元类。

Any metaclass written to solve this problem must, at minimum:为解决此问题而编写的任何元类必须至少:

  • remove the conflicting class attributes/members from the namespace从命名空间中删除冲突的类属性/成员
  • instantiate the class object to create the slot descriptors实例化类对象以创建槽描述符
  • save references to the slot descriptors保存对插槽描述符的引用
  • put the previously removed members and their values back in the class __dict__ (so the dataclass machinery can find them)将先前删除的成员及其值放回__dict__类中(以便dataclass机器可以找到它们)
  • pass the class object to the dataclass decorator将类对象传递给dataclass装饰器
  • restore the slots descriptors to their respective places将插槽描述符恢复到各自的位置
  • also take into account plenty of corner cases (such as what to do if there is a __dict__ slot)还要考虑很多极端情况(例如,如果有__dict__插槽该怎么办)

To say the least, this is an extremely complicated endeavor.至少可以说,这是一项极其复杂的工作。 It would be easier to define the class like the following- without a default value so that the conflict doesn't occur at all- and then add a default value afterward.像下面这样定义类会更容易 - 没有默认值,以便根本不会发生冲突 - 然后添加一个默认值。

Current option: make alterations after class object instantiation当前选项:在类对象实例化后进行更改

The unaltered dataclass would look like this:未更改的数据类如下所示:

@dataclass
class C:
    __slots__ = "x"
    x: int

The alteration is straightforward.更改很简单。 Change the __init__ signature to reflect the desired default value, and then change the __dataclass_fields__ to reflect the presence of a default value.更改__init__签名以反映所需的默认值,然后更改__dataclass_fields__以反映默认值的存在。

from functools import wraps

def change_init_signature(init):
    @wraps(init)
    def __init__(self, x=1):
        init(self,x)
    return __init__

C.__init__ = change_init_signature(C.__init__)

C.__dataclass_fields__["x"].default = 1

Test:测试:

>>> C()
C(x=1)
>>> C(2)
C(x=2)
>>> C.x
<member 'x' of 'C' objects>
>>> vars(C())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: vars() argument must have __dict__ attribute

It works!有用!

Current option: a setmember decorator当前选项: setmember装饰器

With some effort, a so-called setmember decorator could be employed to automatically alter the class in the manner above.通过一些努力,可以使用所谓的setmember装饰器以上述方式自动更改类。 This would require deviating from the dataclasses API in order to define the default value in a location other than inside the class body, perhaps something like:这将需要偏离 dataclasses API,以便在类主体内部以外的位置定义默认值,可能类似于:

@setmember(x=field(default=1))
@dataclass
class C:
    __slots__="x"
    x: int

The same thing could also be accomplished through a __init_subclass__ method on a parent class:同样的事情也可以通过父类上的__init_subclass__方法来完成:

class SlottedDataclass:
    def __init_subclass__(cls, **kwargs):
        cls.__init_subclass__()
        # make the class changes here

class C(SlottedDataclass, x=field(default=1)):
    __slots__ = "x"
    x: int

Future possibility: change the slots machinery未来的可能性:改变老虎机

Another possibility, as mentioned above, would be for the python language to alter the slots machinery to allow more flexibility.如上所述,另一种可能性是 Python 语言改变插槽机制以提供更大的灵活性。 One way of doing this might be to change the slots descriptor itself to store class level data at the time of class definition.这样做的一种方法可能是在类定义时更改槽描述符本身以存储类级别的数据。

This could be done, perhaps, by supplying a dict as the __slots__ argument (see below).这可能可以通过提供一个dict作为__slots__参数来完成(见下文)。 The class-level data (1 for x, 2 for y) could just be stored on the descriptor itself for retrieval later:类级别的数据(x 为 1,y 为 2)可以只存储在描述符本身上供以后检索:

class C:
    __slots__ = {"x": 1, "y": 2}

assert C.x.value == 1
assert C.y.value == y

One difficulty: it may be desired to only have a slot_member.value present on some slots and not others.一个困难:可能只希望在某些插槽上存在slot_member.value在其他插槽上不存在。 This could be accommodated by importing a null-slot factory from a new slottools library:这可以通过从新的slottools库中导入一个空槽工厂来实现:

from slottools import nullslot

class C:
    __slots__ = {"x": 1, "y": 2, "z": nullslot()}

assert not hasattr(C.z, "value")

The style of code suggested above would be a deviation from the dataclasses API.上面建议的代码风格与 dataclasses API 有所不同。 However, the slots machinery itself could even be altered to allow for this style of code, with accommodation of the dataclasses API specifically in mind:然而,插槽机制本身甚至可以改变以允许这种风格的代码,特别考虑到数据类 API 的适应:

class C:
    __slots__ = "x", "y", "z"
    x = 1  # 1 is stored on C.x.value
    y = 2  # 2 is stored on C.y.value

assert C.x.value == 1
assert C.y.value == y
assert not hasattr(C.z, "value")

Future possibility: "prepare" the class namespace inside the class body未来的可能性:“准备”类体内的类命名空间

The other possibility is altering/preparing (synonymous with the __prepare__ method of a metaclass) the class namespace.另一种可能性是改变/准备(与元类的__prepare__方法同义)类命名空间。

Currently, there is no opportunity (other than writing a metaclass) to write code that alters the class namespace before the class object is instantiated, and the slots machinery goes to work.目前,没有机会(除了编写元类)在类对象被实例化之前编写更改类命名空间的代码,并且插槽机制开始工作。 This could be changed by creating a hook for preparing the class namespace beforehand, and making it so that an error complaining about the conflicting names is only produced after that hook has been run.这可以通过创建一个用于预先准备类名称空间的钩子来改变,并使其仅在运行该钩子后才产生抱怨名称冲突的错误。

This so-called __prepare_slots__ hook could look something like this, which I think is not too bad:这个所谓的__prepare_slots__钩子看起来像这样,我认为还不错:

from dataclasses import dataclass, prepare_slots

@dataclass
class C:
    __slots__ = ('x',)
    __prepare_slots__ = prepare_slots
    x: int = field(default=1)

The dataclasses.prepare_slots function would simply be a function-- similar to the __prepare__ method -- that receives the class namespace and alters it before the class is created. dataclasses.prepare_slots函数只是一个函数——类似于__prepare__方法——它接收类命名空间并在创建类之前更改它。 For this case in particular, the default dataclass field values would be stored in some other convenient place so that they can be retrieved after the slot descriptor objects have been created.特别是对于这种情况,默认数据类字段值将存储在其他一些方便的位置,以便在创建槽描述符对象后可以检索它们。


* Note that the default field value conflicting with the slot might also be created by the dataclass machinery if dataclasses.field is being used. * 请注意,如果正在使用dataclasses.field则与插槽冲突的默认字段值也可能由数据类机制创建。

As noted already in the answers, data classes from dataclasses cannot generate slots for the simple reason that slots must be defined before a class is created.正如答案中已经指出的那样,数据类中的数据类不能生成槽,原因很简单,必须在创建类之前定义槽。

In fact, the PEP for data classes explicitly mentions this:事实上, 数据类PEP明确提到了这一点:

At least for the initial release, __slots__ will not be supported.至少对于初始版本,将不支持__slots__ __slots__ needs to be added at class creation time. __slots__需要在创建类时添加。 The Data Class decorator is called after the class is created, so in order to add __slots__ the decorator would have to create a new class, set __slots__ , and return it.在创建类之后调用数据类装饰器,因此为了添加__slots__装饰器必须创建一个新类,设置__slots__并返回它。 Because this behavior is somewhat surprising, the initial version of Data Classes will not support automatically setting __slots__ .因为这种行为有点令人惊讶,数据类的初始版本将不支持自动设置__slots__

I wanted to use slots because I needed to initialise many, many data class instances in another project.我想使用插槽,因为我需要在另一个项目中初始化很多很多数据类实例。 I ended up writing my own own alternative implementation of data classes which supports this, among a few extra features: dataclassy .我最终编写了自己的数据类替代实现,它支持这一点,还有一些额外的功能: dataclassy

dataclassy uses a metaclass approach which has numerous advantages - it enables decorator inheritance, considerably reduced code complexity and of course, the generation of slots. dataclassy 使用元类方法,它具有许多优点——它支持装饰器继承,大大降低了代码复杂性,当然还有槽的生成。 With dataclassy the following is possible:使用 dataclassy 可以实现以下功能:

from dataclassy import dataclass

@dataclass(slots=True)
class Pet:
    name: str
    age: int
    species: str
    fluffy: bool = True

Printing Pet.__slots__ outputs the expected {'name', 'age', 'species', 'fluffy'} , instances have no __dict__ attribute and the overall memory footprint of the object is therefore lower.打印Pet.__slots__输出预期的{'name', 'age', 'species', 'fluffy'} ,实例没有__dict__属性,因此对象的整体内存占用较低。 These observations indicate that __slots__ has been successfully generated and is effective.这些观察结果表明__slots__已成功生成并且有效。 Plus, as evidenced, default values work just fine.另外,正如所证明的,默认值工作得很好。

The least involved solution I've found for this problem is to specify a custom __init__ using object.__setattr__ to assign values.我为这个问题找到的最少涉及的解决方案是使用object.__setattr__指定一个自定义__init__来分配值。

@dataclass(init=False, frozen=True)
class MyDataClass(object):
    __slots__ = (
        "required",
        "defaulted",
    )
    required: object
    defaulted: Optional[object]

    def __init__(
        self,
        required: object,
        defaulted: Optional[object] = None,
    ) -> None:
        super().__init__()
        object.__setattr__(self, "required", required)
        object.__setattr__(self, "defaulted", defaulted)

Following Rick Teachey 's suggestion , I created a slotted_dataclass decorator.按照Rick slotted_dataclass建议,我创建了一个slotted_dataclass装饰器。 It can take, in keyword arguments, anything that you would specify after [field]: [type] = in a dataclass without __slots__ — both default values for fields and field(...) .在关键字参数中,它可以采用您在[field]: [type] =之后指定的任何内容,而没有__slots__的数据类 - fields 和field(...)默认值。 Specifying arguments that should go to old @dataclass constructor is also possible, but in dictionary object as a first positional argument.指定应该去旧@dataclass构造函数的参数也是可能的,但在字典对象中作为第一个位置参数。 So this:所以这:

@dataclass(frozen=True)
class Test:
    a: dict = field(repr=False)
    b: int = 42
    c: list = field(default_factory=list)

would become:会成为:

@slotted_dataclass({'frozen': True}, a=field(repr=False), b=42, c=field(default_factory=list))
class Test:
    __slots__ = ('a', 'b', 'c')
    a: dict
    b: int
    c: list

And here is the source code of this new decorator:这是这个新装饰器的源代码:

def slotted_dataclass(dataclass_arguments=None, **kwargs):
    if dataclass_arguments is None:
        dataclass_arguments = {}

    def decorator(cls):
        old_attrs = {}

        for key, value in kwargs.items():
            old_attrs[key] = getattr(cls, key)
            setattr(cls, key, value)

        cls = dataclass(cls, **dataclass_arguments)
        for key, value in old_attrs.items():
            setattr(cls, key, value)
        return cls

    return decorator

Code explanation代码说明

The code above takes advantage of the fact that dataclasses module gets default field values by calling getattr on the class.上面的代码利用了dataclasses模块通过在类上调用getattr来获取默认字段值的事实。 That makes it possible to deliver our default values by replacing appropriate fields in the __dict__ of the class (which is done in the code by using setattr function).这使得可以通过替换类的__dict__中的适当字段来提供我们的默认值(这是通过使用setattr函数在代码中完成的)。 The class generated by the @dataclass decorator will be then completely identical to the class generated by specifying those after = , like we would if the class didn't contain __slots__ .@dataclass装饰器生成的类将与通过在=之后指定那些生成的类完全相同,就像如果类不包含__slots__

But since the __dict__ of the class with __slots__ contains member_descriptor objects:但是由于带有__slots__的类的__dict__包含member_descriptor对象:

>>> class C:
...     __slots__ = ('a', 'b', 'c')
...
>>> C.__dict__['a']
<member 'a' of 'C' objects>
>>> type(C.__dict__['a'])
<class 'member_descriptor'>

a nice thing to do is backup those objects and restore them after @dataclass decorator does its job, which is done in the code by using old_attrs dictionary.一件好事是备份这些对象并在@dataclass装饰器完成其工作后恢复它们,这是通过使用old_attrs字典在代码中完成的。

Another solution is to generate the slots parameter inside the class body, from the typed annotations.另一种解决方案是在类主体内从类型化注释生成 slot 参数。 this can look like:这看起来像:

@dataclass
class Client:
    first: str
    last: str
    age_of_signup: int
    
     __slots__ = slots(__annotations__)

where the slots function is:其中slots函数是:

def slots(anotes: Dict[str, object]) -> FrozenSet[str]:
    return frozenset(anotes.keys())

running that would generate a slots parameter that looks like: frozenset({'first', 'last', 'age_of_signup})运行将生成一个插槽参数,如下所示: frozenset({'first', 'last', 'age_of_signup})

This takes the annotations above it and makes a set of the specified names.这需要它上面的注释并生成一组指定的名称。 The limitation here is you must re-type the __slots__ = slots(__annotations__) line for every class and it must be positioned below all the annotations and it does not work for annotations with default arguments.这里的限制是您必须为每个类重新键入__slots__ = slots(__annotations__)行,并且它必须位于所有注释下方,并且它不适用于具有默认参数的注释。 This also has the advantage that the slots parameter will never conflict with the specified annotations so you can feel free to add or remove members and not worry about maintaining sperate lists.这还有一个优点,即插槽参数永远不会与指定的注释冲突,因此您可以随意添加或删除成员,而不必担心维护单独的列表。

In Python 3.10+ you can use slots=True with a dataclass to make it more memory-efficient:在 Python 3.10+ 中,您可以将slots=Truedataclass一起使用,以提高内存效率:

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class Point:
    x: int = 0
    y: int = 0

This way you can set default field values as well.这样您也可以设置默认字段值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM