[英]How can I make a python dataclass hashable without making them immutable?
Say a I have a dataclass in python3.假设我在 python3 中有一个数据类。 I want to be able to hash and order these objects.我希望能够对这些对象进行散列和排序。 I do not want these to be immutable.我不希望这些是一成不变的。
I only want them ordered/hashed on id.我只希望它们在 id 上订购/散列。
I see in the docs that I can just implement _ hash _ and all that but I'd like to get datacalsses to do the work for me because they are intended to handle this.我在文档中看到我可以只实现 _ hash _ 和所有这些,但我想让 datacalsses 为我做这项工作,因为它们旨在处理这个问题。
from dataclasses import dataclass, field
@dataclass(eq=True, order=True)
class Category:
id: str = field(compare=True)
name: str = field(default="set this in post_init", compare=False)
a = sorted(list(set([ Category(id='x'), Category(id='y')])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
Here are the rules governing implicit creation of a
__hash__()
method:以下是__hash__()
方法的隐式创建规则:[...] [...]
If
eq
andfrozen
are both true, by defaultdataclass()
will generate a__hash__()
method for you.如果eq
和frozen
都为真,默认情况下__hash__()
dataclass()
会为你生成一个__hash__()
方法。 Ifeq
is true andfrozen
is false,__hash__()
will be set toNone
, marking it unhashable (which it is, since it is mutable).如果eq
为真,而frozen
为假,则__hash__()
将设置为None
,将其标记为不可散列(确实如此,因为它是可变的)。 Ifeq
is false,__hash__()
will be left untouched meaning the__hash__()
method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).如果eq
为 false,则__hash__()
将保持不变,这意味着将使用超类的__hash__()
方法(如果超类是对象,这意味着它将回退到基于 id 的散列)。
Since you set eq=True
and left frozen
at the default ( False
), your dataclass is unhashable.由于您设置eq=True
并保持默认( False
) frozen
,因此您的数据类是不可散列的。
You have 3 options:您有 3 个选择:
frozen=True
(in addition to eq=True
), which will make your class immutable and hashable.设置frozen=True
(除了eq=True
),这将使您的类不可变和可散列。 Set unsafe_hash=True
, which will create a __hash__
method but leave your class mutable, thus risking problems if an instance of your class is modified while stored in a dict or set:设置unsafe_hash=True
,这将创建一个__hash__
方法,但使您的类保持可变,因此如果您的类的实例在存储在 dict 或 set 中时被修改,则可能会出现问题:
cat = Category('foo', 'bar') categories = {cat} cat.id = 'baz' print(cat in categories) # False
__hash__
method.手动实现__hash__
方法。TL;DR TL; 博士
Use frozen=True
in conjunction to eq=True
(which will make the instances immutable).将frozen=True
与eq=True
结合使用(这将使实例不可变)。
Long Answer长答案
__hash__()
is used by built-inhash()
, and when objects are added to hashed collections such as dictionaries and sets.__hash__()
由内置hash()
,当对象被添加到散列集合(如字典和集合)时。 Having a__hash__()
implies that instances of the class are immutable.拥有__hash__()
意味着该类的实例是不可变的。 Mutability is a complicated property that depends on the programmer's intent, the existence and behavior of__eq__()
, and the values of the eq and frozen flags in thedataclass()
decorator.可变性是一个复杂的属性,它取决于程序员的意图、__eq__()
的存在和行为,以及dataclass()
装饰器中 eq 和冻结标志的值。By default,
dataclass()
will not implicitly add a__hash__()
method unless it is safe to do so.默认情况下,__hash__()
dataclass()
不会隐式添加__hash__()
方法,除非这样做是安全的。 Neither will it add or change an existing explicitly defined__hash__()
method.它也不会添加或更改现有的显式定义的__hash__()
方法。 Setting the class attribute__hash__ = None
has a specific meaning to Python, as described in the__hash__()
documentation.设置类属性__hash__ = None
对 Python 具有特定含义,如__hash__()
文档中所述。If
__hash__()
is not explicit defined, or if it is set to None, thendataclass()
may add an implicit__hash__()
method.如果__hash__()
未显式定义,或者设置为 None,则__hash__()
dataclass()
可能会添加一个隐式__hash__()
方法。 Although not recommended, you can forcedataclass()
to create a__hash__()
method withunsafe_hash=True
.虽然不建议,您可以强制dataclass()
来创建一个__hash__()
与方法unsafe_hash=True
。 This might be the case if your class is logically immutable but can nonetheless be mutated.如果您的类在逻辑上是不可变的,但仍然可以改变,则可能就是这种情况。 This is a specialized use case and should be considered carefully.这是一个专门的用例,应该仔细考虑。Here are the rules governing implicit creation of a
__hash__()
method.以下是控制__hash__()
方法的隐式创建的规则。 Note that you cannot both have an explicit__hash__()
method in your dataclass and setunsafe_hash=True
;请注意,您不能在数据类中同时具有显式__hash__()
方法并设置unsafe_hash=True
; this will result in aTypeError
.这将导致TypeError
。If eq and frozen are both true, by default
dataclass()
will generate a__hash__()
method for you.如果 eq 和frozen 都为真,默认情况下__hash__()
dataclass()
会为你生成一个__hash__()
方法。 If eq is true and frozen is false,__hash__()
will be set to None, marking it unhashable (which it is, since it is mutable).如果 eq 为真,而frozen 为假,则__hash__()
将设置为None,将其标记为不可散列(确实如此,因为它是可变的)。 If eq is false,__hash__()
will be left untouched meaning the__hash__()
method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).如果 eq 为 false,则__hash__()
将保持不变,这意味着将使用超类的__hash__()
方法(如果超类是对象,这意味着它将回退到基于 id 的散列)。
I'd like to add a special note for use of unsafe_hash.我想为 unsafe_hash 的使用添加一个特别说明。
You can exclude fields from being compared by hash by setting compare=False, or hash=False.您可以通过设置 compare=False 或 hash=False 来排除通过哈希进行比较的字段。 (hash by default inherits from compare). (默认情况下哈希从比较继承)。
This might be useful if you store nodes in a graph but want to mark them visited without breaking their hashing (eg if they're in a set of unvisited nodes..).如果您将节点存储在图中,但希望在不破坏其散列的情况下将它们标记为已访问(例如,如果它们位于一组未访问的节点中......),这可能很有用。
from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
x:int
visit_count: int = field(default=10, compare=False) # hash inherits compare setting. So valid.
# visit_count: int = field(default=False, hash=False) # also valid. Arguably easier to read, but can break some compare code.
# visit_count: int = False # if mutated, hashing breaks. (3* printed)
s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
print("2* n still in s")
else:
print("3* n is lost to the void because hashing broke.")
This took me hours to figure out... Useful further readings I found is the python doc on dataclasses.这花了我几个小时才弄清楚......我发现有用的进一步阅读是关于数据类的python doc。 Specifically see the field documentation and dataclass arg documentations.具体参见 field 文档和 dataclass arg 文档。 https://docs.python.org/3/library/dataclasses.html https://docs.python.org/3/library/dataclasses.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.