简体   繁体   English

为什么可变的内置对象在Python中无法清除?这有什么好处?

[英]Why mutable built-in objects cannot be hashable in Python? What is the benefit of this?

I come from Java where even mutable objects can be "hashable". 我来自Java,即使是可变对象也可以“可以”。
And I am playing with Python 3.x these days just for fun. 而这些天我正在玩Python 3.x只是为了好玩。

Here is the definition of hashable in Python (from the Python glossary). 以下是Python中hashable的定义(来自Python词汇表)。

hashable 可哈希

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). 如果一个对象具有一个在其生命周期内永远不会改变的哈希值(它需要__hash__()方法),并且可以与其他对象进行比较(它需要__eq__()方法),则该对象是可__hash__() Hashable objects which compare equal must have the same hash value. 比较相等的可哈希对象必须具有相同的哈希值。

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally. Hashability使对象可用作字典键和set成员,因为这些数据结构在内部使用哈希值。

All of Python's immutable built-in objects are hashable; 所有Python的不可变内置对象都是可清除的; mutable containers (such as lists or dictionaries) are not. 可变容器(例如列表或词典)不是。 Objects which are instances of user-defined classes are hashable by default. 默认情况下,作为用户定义类的实例的对象是可清除的。 They all compare unequal (except with themselves), and their hash value is derived from their id() . 它们都比较不相等(除了它们自己),它们的哈希值来自它们的id()

I read it and I am thinking... Still... Why didn't they make in Python even mutable objects hashable? 我读了它,我在想......还是......为什么他们不用Python制作甚至可变的对象? Eg using the same default hashing mechanism as for user-defined objects ie as described by the last 2 sentences above. 例如,使用与用户定义的对象相同的默认散列机制,即如上面的最后2个句子所描述的。

Objects which are instances of user-defined classes are hashable by default. 默认情况下,作为用户定义类的实例的对象是可清除的。 They all compare unequal (except with themselves), and their hash value is derived from their id(). 它们都比较不相等(除了它们自己),它们的哈希值来自它们的id()。

This feels somewhat weird... so user-defined mutable objects are hashable (via this default hashing mechanism) but built-in mutable objects are not hashable. 这感觉有点奇怪...所以用户定义的可变对象是可以清除的(通过这个默认的散列机制),但是内置的可变对象是不可清除的。 Doesn't this just complicate things? 这不会让事情变得复杂吗? I don't see what benefits it brings, could someone explain? 我不知道它带来了什么好处,有人可以解释一下吗?

In Python, mutable objects can be hashable, but it is generally not a good idea, because generally speaking, the equality is defined in terms of these mutable attributes, and this can lead to all sorts of crazy behavhior. 在Python中,可变对象可以是可清除的,但它通常不是一个好主意,因为一般来说, 相等是根据这些可变属性定义的,这可能会导致各种疯狂的行为。

If built-in mutable objects are hashed based on identity, like the default hashing mechanism for user-defined objects, then their hash would be inconsistent with their equality. 如果基于标识对内置可变对象进行散列处理(如用户定义对象的默认散列机制),那么它们的散列将与它们的相等性不一致。 And that is absolutely a problem. 绝对是个问题。 However, user-defined objects by default compare and hash based on identity, so it isn't as bad of a situation, although, this set of affairs isn't very useful. 但是,默认情况下,用户定义的对象会根据身份进行比较和散列,因此情况并不差,尽管这组事务并不是很有用。

Note, if you implement __eq__ in a user-defined class, the __hash__ is set to None , making the class unhashable . 注意,如果在用户定义的类中实现__eq__ ,则__hash__将设置为None ,从而使类不可用

So, from the Python 3 data model documentation : 那么,从Python 3数据模型文档

User-defined classes have __eq__() and __hash__() methods by default; 用户定义的类默认具有__eq__()__hash__()方法; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y) . 与它们相比,所有对象都比较不相等(除了它们自己)和x.__hash__()返回一个合适的值,使得x == y意味着x is yhash(x) == hash(y)

A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None . 覆盖__eq__()并且未定义__hash__()__hash__()隐式设置为None When the __hash__() method of a class is None , instances of the class will raise an appropriate TypeError when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking isinstance(obj, collections.abc.Hashable) . 当类的__hash__()方法为None ,类的实例将在程序尝试检索其哈希值时引发相应的TypeError,并且在检查isinstance(obj, collections.abc.Hashable)时也将被正确识别为不可用isinstance(obj, collections.abc.Hashable)

Calculating a hash value is like giving an identity to an object which simplify the comparison of objects. 计算哈希值就像给对象一个标识,简化了对象的比较。 The comparison by hash value is generally faster than the comparison by value: for an object, you compare its attributes, for a collection, you compare its items, recursively… 通过散列值进行比较通常比按值进行比较更快:对于对象,您要比较其属性,对于集合,您要比较其项目,递归...

If an object is mutable you need to calculate its hash value again after each changes. 如果对象是可变的,则需要在每次更改后再次计算其哈希值。 If this object was compared equal with another one, after a change it becomes unequal. 如果将此对象与另一个对象进行比较,则在更改后它变得不相等。 So, mutable objects must be compared by value, not by hash. 因此,必须通过值来比较可变对象,而不是通过哈希。 It's a non-send to compare by hash values for mutable objects. 通过可变对象的哈希值进行比较是非发送的。

Edit: Java HashCode 编辑:Java HashCode

Typically, hashCode() just returns the object's address in memory if you don't override it. 通常,如果不覆盖它,hashCode()只返回内存中对象的地址。

See the reference about the hashCode function. 请参阅有关hashCode函数的参考

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. 尽可能合理,Object类定义的hashCode方法确实为不同的对象返回不同的整数。 (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.) (这通常通过将对象的内部地址转换为整数来实现,但JavaTM编程语言不需要此实现技术。)

So, the Java hashCode function works the same as the default Python __hash__ function. 因此,Java hashCode函数与默认的Python __hash__函数的工作方式相同。

In Java, if you use a mutable object in a HashSet , for instance, the HashSet isn't working properly. 在Java中,例如,如果在HashSet使用可变对象,则HashSet无法正常工作。 Because the hashCode depends of the state of the object it can no longer be retrieved properly, so the check for containment fails. 因为hashCode取决于对象的状态,所以无法再正确检索它,因此检查包含失败。

From reading other comments/answers, it seems like what you're not buying is that you have to change a hash of a mutable entity when it mutates, and that you can just hash by id , so I'll try to elaborate on this point. 从阅读其他评论/答案,看起来你不买的是你必须在变异时改变一个可变实体的哈希,并且你可以通过id哈希,所以我将试着详细说明这个点。

To quote you: 引用你的话:

@kindall Hm... Who says that the hash value has to come from the values in the list? @kindall Hm ......谁说哈希值必须来自列表中的值? And that if you eg add a new value you have to rehash the list, get a new hash value, etc.. In other languages that's not how it is... this is my point. 如果你添加一个新的值,你必须重新列出列表,获得一个新的哈希值,等等。在其他语言中,它不是这样的...这是我的观点。 In other languages the hash value just comes from the id (or is the id itself, just like for user-defined mutable Python objects)... And OK... I just feel it makes things a bit too complicated in Python (especially for beginners... not for me). 在其他语言中,哈希值只来自id(或者就是id本身,就像用户定义的可变Python对象一样)......而且好吧......我觉得它在Python中使事情有点过于复杂(特别是对于初学者......不适合我)。

This isn't exactly false (although I do not know what "other" languages you are referencing), you could do that, but there are some pretty dire consequences: 这并非完全错误(虽然我不知道你引用了什么“其他”语言),你可以这样做,但是有一些非常可怕的后果:

class HashableList(list):
    def __hash__(self):
        return id(self)

x = HashableList([1,2,3])
y = HashableList([1,2,3])

our_set = {x}

print("Is x in our_set? ", x in our_set)
print("Is y in our_set? ", y in our_set)
print("Are x and y equal? ", x == y)

This (unexpectedly) outputs: 这(意外)输出:

Is x in our_set?  True
Is y in our_set?  False <-- potentially confusing
Are x and y equal? True

This means that the hash is not consistent with equality, which is just downright confusing. 这意味着哈希与等式不一致,这简直令人困惑。

You might counter with "well, just hash by the contents then", but I think you already understand that if the contents change then you get other undesirable behavior (for example): 您可能会反驳“好吧,只是通过内容哈希”,但我认为您已经明白,如果内容发生变化,那么您会得到其他不良行为(例如):

class HashableListByContents(list):
    def __hash__(self):
        return sum(hash(x) for x in self)

a = HashableListByContents([1,2,3])
b = HashableListByContents([1,2,3])

our_set = {a}

print('Is a in our_set? ', a in our_set)
print('Is b in our_set? ', b in our_set)
print('Are a and b equal? ', a == b)

This outputs: 这输出:

Is a in our_set?  True
Is b in our_set?  True
Are a and b equal?  True

So far so good! 到现在为止还挺好! But... 但...

a.append(2)
print('Is a still in our set? ', a in our_set)

this outputs: 这个输出:

Is a still in our set?  False <-- potentially confusing

I am not a Python beginner, so I would not presume to know what would or would not confuse a Python beginner, but either way this seems confusing to me (at best). 我不是Python初学者,所以我不会想知道什么会混淆或不会混淆Python初学者,但无论哪种方式,这似乎让我感到困惑 (充其量)。 My two cents is that it's simply incorrect to hash mutable objects. 我的两分钱是哈希可变对象的错误。 I mean we have functional purists that claim mutable objects are just incorrect, period! 我的意思是我们有功能纯粹主义者声称可变对象只是不正确,期间! Python won't stop you from doing any of what you described, because it would never force a paradigm like that, but it's really asking for trouble no matter what route you go down. Python不会阻止你做你所描述的任何事情,因为它永远不会强迫这样的范例,但无论你走哪条路线,它都会遇到麻烦。

HTH! HTH!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM