简体   繁体   English

具有类型注释和默认值的字段的数据类 hash() = None 始终是不确定的

[英]dataclass hash() of field with type annotation and default value = None is always nondeterministic

I am running into some unexpected behavior when trying to hash a dataclass and I'm wondering if anyone can explain it.我在尝试 hash 数据类时遇到了一些意外行为,我想知道是否有人可以解释它。

The below script reproduces the problem.下面的脚本重现了该问题。 First, we need to run export PYTHONHASHSEED='0' to disable hash randomization so we can compare the hash across runs.首先,我们需要运行export PYTHONHASHSEED='0'来禁用 hash 随机化,这样我们就可以比较 hash 的运行情况。

import os
from dataclasses import dataclass
from typing import Optional

assert os.getenv("PYTHONHASHSEED", None) == "0"


@dataclass(frozen=True)
class Foo:
    x = 1
    y = None


@dataclass(frozen=True)
class Bar:
    x: Optional[int] = 1
    y = None


@dataclass(frozen=True)
class Foobar:
    x = 1
    y: Optional[int] = None


print("hash(Foo()):", hash(Foo()))
print("hash(Bar()):", hash(Bar()))
print("hash(Foobar()):", hash(Foobar()))

Here's the result of running the script twice:这是两次运行脚本的结果:

>>> py temp.py 
hash(Foo()): 5740354900026072187
hash(Bar()): -6644214454873602895
hash(Foobar()): 582415153292506125
>>> py temp.py 
hash(Foo()): 5740354900026072187
hash(Bar()): -6644214454873602895
hash(Foobar()): -8226650923609135754

Note that the hash for the first two classes is the same across runs, but the hash of the last class is different each time.请注意,前两个类的 hash 在运行中是相同的,但最后一个 class 的 hash 每次都不同。 It seems to be the combination of the type annotation with the value None in the class Foobar that causes the hash to change.似乎是类型注释与 class Foobar中的值 None 的组合导致 hash 发生变化。 (Incidentally, if I replace Optional[int] with int I get the same behavior.) (顺便说一句,如果我用int替换Optional[int]我会得到相同的行为。)

I tried with both Python 3.9 and 3.10 and got similar results each time.我尝试了 Python 3.9 和 3.10,每次都得到类似的结果。

Can anyone explain what is going on?谁能解释发生了什么?

Dataclass fields must be annotated.必须对数据类字段进行注释。 The annotation is how the dataclass machinery determines that something is a field.注释是数据类机制如何确定某事是一个字段。 All 3 of your dataclasses are broken due to missing annotations.由于缺少注释,您的所有 3 个数据类都已损坏。


Disabling hash randomization isn't supposed to make hashes deterministic.禁用 hash 随机化不应该使哈希具有确定性。 It just disables one specific security feature that deliberately randomizes some types' hashes to mitigate hash collision-based denial-of-service attacks.它只是禁用了一项特定的安全功能,该功能故意随机化某些类型的哈希以减轻 hash 基于冲突的拒绝服务攻击。

The default CPython object.__hash__ is nondeterministic.默认的 CPython object.__hash__是不确定的。 It's based on an object's address, which is not consistent from run to run.它基于对象的地址,每次运行都不一致。 None uses this default hash, so hash(None) is nondeterministic, and your dataclass hashes are based on their fields' hashes, so the hash of a dataclass with a None field value is also nondeterministic. None使用此默认 hash,因此hash(None)是不确定的,并且您的数据类哈希基于其字段的哈希,因此具有None字段值的数据类的 hash 也是不确定的。 However, since your dataclasses are broken, Foobar is the only one where y is actually a field.但是,由于您的数据类已损坏,因此Foobar是唯一一个y实际上是字段的。

Bar() 's hash seems to be deterministic because it only depends on the hashes of ints and tuples (the frozen dataclass __hash__ implementation builds a tuple of field values and hashes that), and the int and tuple hash algorithms happen to be close to deterministic. Bar()的 hash 似乎是确定性的,因为它只依赖于整数和元组的哈希(冻结的数据类__hash__ 实现构建了一个字段值和哈希的元组),而整数和元组 hash 恰好接近算法确定性的。 They're not actually deterministic, though;不过,它们实际上并不是确定性的。 they depend on whether you're on a 32-bit or 64-bit Python build, and while the int hashing algorithm is mostly specified , the tuple hash algorithm is all implementation details.它们取决于您使用的是 32 位还是 64 位 Python 构建,虽然主要指定了 int 散列算法,但元组 hash 算法是所有实现细节。

hash is not designed to be deterministic, no matter what settings you use.无论您使用什么设置, hash的设计都不是确定性的。 If you need deterministic hashing, do not use hash .如果您需要确定性哈希,请不要使用hash

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 具有默认值但没有类型注释的函数参数的推断类型是什么? 初始化为“无”的变量怎么样? - What is the inferred type of a function argument with a default value but no type annotation? How about a variable initialized as 'None'? 检查数据类字段是否具有默认值的 Pythonic 方法 - Pythonic way to check if a dataclass field has a default value 当提供的参数为None时,是否可以强制数据类字段调用其default_factory? - Is it possible to force a dataclass field to call its default_factory when the supplied argument is None? Dataclass 属性的 Python 类型提示注释 - Python type hinting annotation for Dataclass attribute Django获取字段的默认值(如果没有) - Django get default value of field if its none 如何声明与数据类类型相同的 python 数据类成员字段 - How to declare python dataclass member field same as the dataclass type 使用字典类型的字段转换数据类,其中另一个数据类是 keof dicty - Converting a dataclass with a field of the dictionary type, where the other dataclass is the keof dicty 默认为 None 的参数是否应该总是类型提示为 Optional[]? - Should arguments that default to None always be type hinted as Optional[]? python 如何知道 dataclasses.field 函数不是数据类中的默认值? - How does python know that dataclasses.field function is not a default value in a dataclass? 如何在不注释类型的情况下添加数据类字段? - How to add a dataclass field without annotating the type?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM