[英]How can dataclasses be made to work better with __slots__?
It was decided to remove direct support for __slots__
from dataclasses for Python 3.7.决定从 Python 3.7 的数据类中删除对
__slots__
直接支持。
Despite this, __slots__
can still be used with dataclasses:尽管如此,
__slots__
仍然可以与数据类一起使用:
from dataclasses import dataclass
@dataclass
class C():
__slots__ = "x"
x: int
However, because of the way __slots__
works it isn't possible to assign a default value to a dataclass field:但是,由于
__slots__
工作方式,无法为数据类字段分配默认值:
from dataclasses import dataclass
@dataclass
class C():
__slots__ = "x"
x: int = 1
This results in an error:这会导致错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable
How can __slots__
and default dataclass
fields be made to work together?如何使
__slots__
和默认dataclass
字段一起工作?
2021 UPDATE: direct support for __slots__
is added to python 3.10. 2021 更新:对
__slots__
直接支持已添加到 python 3.10。 I am leaving this answer for posterity and won't be updating it.我将这个答案留给后人,不会更新。
The problem is not unique to dataclasses.这个问题并不是数据类独有的。 ANY conflicting class attribute will stomp all over a slot:
任何冲突的类属性都会在一个插槽上踩踏:
>>> class Failure:
... __slots__ = tuple("xyz")
... x=1
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable
This is simply how slots work.这就是插槽的工作方式。 The error happens because
__slots__
creates a class-level descriptor object for each slot name:发生错误是因为
__slots__
为每个插槽名称创建了一个类级描述符对象:
>>> class Success:
... __slots__ = tuple("xyz")
...
>>>
>>> type(Success.x)
<class 'member_descriptor'>
In order to prevent this conflicting variable name error, the class namespace must be altered before the class object is instantiated such that there are not two objects competing for the same member name in the class:为了防止这个变量名冲突的错误,在类对象被实例化之前必须改变类命名空间,这样在类中不会有两个对象竞争同一个成员名:
For this reason, an __init_subclass__
method on a parent class will not be sufficient, nor will a class decorator, because in both cases the class object has already been created by the time these functions have received the class to alter it.出于这个原因,父类上的
__init_subclass__
方法是不够的,类装饰器也是不够的,因为在这两种情况下,当这些函数接收到类来改变它时,类对象已经被创建。
Until such time as the slots machinery is altered to allow more flexibility, or the language itself provides an opportunity to alter the class namespace before the class object is instantiated, our only choice is to use a metaclass.在更改槽机制以提供更大的灵活性之前,或者语言本身提供了在类对象实例化之前更改类命名空间的机会之前,我们唯一的选择是使用元类。
Any metaclass written to solve this problem must, at minimum:为解决此问题而编写的任何元类必须至少:
__dict__
(so the dataclass
machinery can find them)__dict__
类中(以便dataclass
机器可以找到它们)dataclass
decoratordataclass
装饰器__dict__
slot)__dict__
插槽该怎么办) To say the least, this is an extremely complicated endeavor.至少可以说,这是一项极其复杂的工作。 It would be easier to define the class like the following- without a default value so that the conflict doesn't occur at all- and then add a default value afterward.
像下面这样定义类会更容易 - 没有默认值,以便根本不会发生冲突 - 然后添加一个默认值。
The unaltered dataclass would look like this:未更改的数据类如下所示:
@dataclass
class C:
__slots__ = "x"
x: int
The alteration is straightforward.更改很简单。 Change the
__init__
signature to reflect the desired default value, and then change the __dataclass_fields__
to reflect the presence of a default value.更改
__init__
签名以反映所需的默认值,然后更改__dataclass_fields__
以反映默认值的存在。
from functools import wraps
def change_init_signature(init):
@wraps(init)
def __init__(self, x=1):
init(self,x)
return __init__
C.__init__ = change_init_signature(C.__init__)
C.__dataclass_fields__["x"].default = 1
Test:测试:
>>> C()
C(x=1)
>>> C(2)
C(x=2)
>>> C.x
<member 'x' of 'C' objects>
>>> vars(C())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: vars() argument must have __dict__ attribute
It works!有用!
setmember
decoratorsetmember
装饰器With some effort, a so-called setmember
decorator could be employed to automatically alter the class in the manner above.通过一些努力,可以使用所谓的
setmember
装饰器以上述方式自动更改类。 This would require deviating from the dataclasses API in order to define the default value in a location other than inside the class body, perhaps something like:这将需要偏离 dataclasses API,以便在类主体内部以外的位置定义默认值,可能类似于:
@setmember(x=field(default=1))
@dataclass
class C:
__slots__="x"
x: int
The same thing could also be accomplished through a __init_subclass__
method on a parent class:同样的事情也可以通过父类上的
__init_subclass__
方法来完成:
class SlottedDataclass:
def __init_subclass__(cls, **kwargs):
cls.__init_subclass__()
# make the class changes here
class C(SlottedDataclass, x=field(default=1)):
__slots__ = "x"
x: int
Another possibility, as mentioned above, would be for the python language to alter the slots machinery to allow more flexibility.如上所述,另一种可能性是 Python 语言改变插槽机制以提供更大的灵活性。 One way of doing this might be to change the slots descriptor itself to store class level data at the time of class definition.
这样做的一种方法可能是在类定义时更改槽描述符本身以存储类级别的数据。
This could be done, perhaps, by supplying a dict
as the __slots__
argument (see below).这可能可以通过提供一个
dict
作为__slots__
参数来完成(见下文)。 The class-level data (1 for x, 2 for y) could just be stored on the descriptor itself for retrieval later:类级别的数据(x 为 1,y 为 2)可以只存储在描述符本身上供以后检索:
class C:
__slots__ = {"x": 1, "y": 2}
assert C.x.value == 1
assert C.y.value == y
One difficulty: it may be desired to only have a slot_member.value
present on some slots and not others.一个困难:可能只希望在某些插槽上存在
slot_member.value
在其他插槽上不存在。 This could be accommodated by importing a null-slot factory from a new slottools
library:这可以通过从新的
slottools
库中导入一个空槽工厂来实现:
from slottools import nullslot
class C:
__slots__ = {"x": 1, "y": 2, "z": nullslot()}
assert not hasattr(C.z, "value")
The style of code suggested above would be a deviation from the dataclasses API.上面建议的代码风格与 dataclasses API 有所不同。 However, the slots machinery itself could even be altered to allow for this style of code, with accommodation of the dataclasses API specifically in mind:
然而,插槽机制本身甚至可以改变以允许这种风格的代码,特别考虑到数据类 API 的适应:
class C:
__slots__ = "x", "y", "z"
x = 1 # 1 is stored on C.x.value
y = 2 # 2 is stored on C.y.value
assert C.x.value == 1
assert C.y.value == y
assert not hasattr(C.z, "value")
The other possibility is altering/preparing (synonymous with the __prepare__
method of a metaclass) the class namespace.另一种可能性是改变/准备(与元类的
__prepare__
方法同义)类命名空间。
Currently, there is no opportunity (other than writing a metaclass) to write code that alters the class namespace before the class object is instantiated, and the slots machinery goes to work.目前,没有机会(除了编写元类)在类对象被实例化之前编写更改类命名空间的代码,并且插槽机制开始工作。 This could be changed by creating a hook for preparing the class namespace beforehand, and making it so that an error complaining about the conflicting names is only produced after that hook has been run.
这可以通过创建一个用于预先准备类名称空间的钩子来改变,并使其仅在运行该钩子后才产生抱怨名称冲突的错误。
This so-called __prepare_slots__
hook could look something like this, which I think is not too bad:这个所谓的
__prepare_slots__
钩子看起来像这样,我认为还不错:
from dataclasses import dataclass, prepare_slots
@dataclass
class C:
__slots__ = ('x',)
__prepare_slots__ = prepare_slots
x: int = field(default=1)
The dataclasses.prepare_slots
function would simply be a function-- similar to the __prepare__
method -- that receives the class namespace and alters it before the class is created. dataclasses.prepare_slots
函数只是一个函数——类似于__prepare__
方法——它接收类命名空间并在创建类之前更改它。 For this case in particular, the default dataclass field values would be stored in some other convenient place so that they can be retrieved after the slot descriptor objects have been created.特别是对于这种情况,默认数据类字段值将存储在其他一些方便的位置,以便在创建槽描述符对象后可以检索它们。
* Note that the default field value conflicting with the slot might also be created by the dataclass machinery if dataclasses.field
is being used. * 请注意,如果正在使用
dataclasses.field
则与插槽冲突的默认字段值也可能由数据类机制创建。
As noted already in the answers, data classes from dataclasses cannot generate slots for the simple reason that slots must be defined before a class is created.正如答案中已经指出的那样,数据类中的数据类不能生成槽,原因很简单,必须在创建类之前定义槽。
In fact, the PEP for data classes explicitly mentions this:事实上, 数据类的PEP明确提到了这一点:
At least for the initial release,
__slots__
will not be supported.至少对于初始版本,将不支持
__slots__
。__slots__
needs to be added at class creation time.__slots__
需要在创建类时添加。 The Data Class decorator is called after the class is created, so in order to add__slots__
the decorator would have to create a new class, set__slots__
, and return it.在创建类之后调用数据类装饰器,因此为了添加
__slots__
装饰器必须创建一个新类,设置__slots__
并返回它。 Because this behavior is somewhat surprising, the initial version of Data Classes will not support automatically setting__slots__
.因为这种行为有点令人惊讶,数据类的初始版本将不支持自动设置
__slots__
。
I wanted to use slots because I needed to initialise many, many data class instances in another project.我想使用插槽,因为我需要在另一个项目中初始化很多很多数据类实例。 I ended up writing my own own alternative implementation of data classes which supports this, among a few extra features: dataclassy .
我最终编写了自己的数据类替代实现,它支持这一点,还有一些额外的功能: dataclassy 。
dataclassy uses a metaclass approach which has numerous advantages - it enables decorator inheritance, considerably reduced code complexity and of course, the generation of slots. dataclassy 使用元类方法,它具有许多优点——它支持装饰器继承,大大降低了代码复杂性,当然还有槽的生成。 With dataclassy the following is possible:
使用 dataclassy 可以实现以下功能:
from dataclassy import dataclass
@dataclass(slots=True)
class Pet:
name: str
age: int
species: str
fluffy: bool = True
Printing Pet.__slots__
outputs the expected {'name', 'age', 'species', 'fluffy'}
, instances have no __dict__
attribute and the overall memory footprint of the object is therefore lower.打印
Pet.__slots__
输出预期的{'name', 'age', 'species', 'fluffy'}
,实例没有__dict__
属性,因此对象的整体内存占用较低。 These observations indicate that __slots__
has been successfully generated and is effective.这些观察结果表明
__slots__
已成功生成并且有效。 Plus, as evidenced, default values work just fine.另外,正如所证明的,默认值工作得很好。
The least involved solution I've found for this problem is to specify a custom __init__
using object.__setattr__
to assign values.我为这个问题找到的最少涉及的解决方案是使用
object.__setattr__
指定一个自定义__init__
来分配值。
@dataclass(init=False, frozen=True)
class MyDataClass(object):
__slots__ = (
"required",
"defaulted",
)
required: object
defaulted: Optional[object]
def __init__(
self,
required: object,
defaulted: Optional[object] = None,
) -> None:
super().__init__()
object.__setattr__(self, "required", required)
object.__setattr__(self, "defaulted", defaulted)
Following Rick Teachey 's suggestion , I created a slotted_dataclass
decorator.按照Rick
slotted_dataclass
的建议,我创建了一个slotted_dataclass
装饰器。 It can take, in keyword arguments, anything that you would specify after [field]: [type] =
in a dataclass without __slots__
— both default values for fields and field(...)
.在关键字参数中,它可以采用您在
[field]: [type] =
之后指定的任何内容,而没有__slots__
的数据类 - fields 和field(...)
默认值。 Specifying arguments that should go to old @dataclass
constructor is also possible, but in dictionary object as a first positional argument.指定应该去旧
@dataclass
构造函数的参数也是可能的,但在字典对象中作为第一个位置参数。 So this:所以这:
@dataclass(frozen=True)
class Test:
a: dict = field(repr=False)
b: int = 42
c: list = field(default_factory=list)
would become:会成为:
@slotted_dataclass({'frozen': True}, a=field(repr=False), b=42, c=field(default_factory=list))
class Test:
__slots__ = ('a', 'b', 'c')
a: dict
b: int
c: list
And here is the source code of this new decorator:这是这个新装饰器的源代码:
def slotted_dataclass(dataclass_arguments=None, **kwargs):
if dataclass_arguments is None:
dataclass_arguments = {}
def decorator(cls):
old_attrs = {}
for key, value in kwargs.items():
old_attrs[key] = getattr(cls, key)
setattr(cls, key, value)
cls = dataclass(cls, **dataclass_arguments)
for key, value in old_attrs.items():
setattr(cls, key, value)
return cls
return decorator
The code above takes advantage of the fact that dataclasses
module gets default field values by calling getattr
on the class.上面的代码利用了
dataclasses
模块通过在类上调用getattr
来获取默认字段值的事实。 That makes it possible to deliver our default values by replacing appropriate fields in the __dict__
of the class (which is done in the code by using setattr
function).这使得可以通过替换类的
__dict__
中的适当字段来提供我们的默认值(这是通过使用setattr
函数在代码中完成的)。 The class generated by the @dataclass
decorator will be then completely identical to the class generated by specifying those after =
, like we would if the class didn't contain __slots__
.由
@dataclass
装饰器生成的类将与通过在=
之后指定那些生成的类完全相同,就像如果类不包含__slots__
。
But since the __dict__
of the class with __slots__
contains member_descriptor
objects:但是由于带有
__slots__
的类的__dict__
包含member_descriptor
对象:
>>> class C:
... __slots__ = ('a', 'b', 'c')
...
>>> C.__dict__['a']
<member 'a' of 'C' objects>
>>> type(C.__dict__['a'])
<class 'member_descriptor'>
a nice thing to do is backup those objects and restore them after @dataclass
decorator does its job, which is done in the code by using old_attrs
dictionary.一件好事是备份这些对象并在
@dataclass
装饰器完成其工作后恢复它们,这是通过使用old_attrs
字典在代码中完成的。
Another solution is to generate the slots parameter inside the class body, from the typed annotations.另一种解决方案是在类主体内从类型化注释生成 slot 参数。 this can look like:
这看起来像:
@dataclass
class Client:
first: str
last: str
age_of_signup: int
__slots__ = slots(__annotations__)
where the slots
function is:其中
slots
函数是:
def slots(anotes: Dict[str, object]) -> FrozenSet[str]:
return frozenset(anotes.keys())
running that would generate a slots parameter that looks like: frozenset({'first', 'last', 'age_of_signup})
运行将生成一个插槽参数,如下所示:
frozenset({'first', 'last', 'age_of_signup})
This takes the annotations above it and makes a set of the specified names.这需要它上面的注释并生成一组指定的名称。 The limitation here is you must re-type the
__slots__ = slots(__annotations__)
line for every class and it must be positioned below all the annotations and it does not work for annotations with default arguments.这里的限制是您必须为每个类重新键入
__slots__ = slots(__annotations__)
行,并且它必须位于所有注释下方,并且它不适用于具有默认参数的注释。 This also has the advantage that the slots parameter will never conflict with the specified annotations so you can feel free to add or remove members and not worry about maintaining sperate lists.这还有一个优点,即插槽参数永远不会与指定的注释冲突,因此您可以随意添加或删除成员,而不必担心维护单独的列表。
In Python 3.10+ you can use slots=True
with a dataclass
to make it more memory-efficient:在 Python 3.10+ 中,您可以将
slots=True
与dataclass
一起使用,以提高内存效率:
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Point:
x: int = 0
y: int = 0
This way you can set default field values as well.这样您也可以设置默认字段值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.