简体   繁体   English

在 Python 中,为什么元组可以散列而不是列表?

[英]In Python, why is a tuple hashable but not a list?

Here below when I try to hash a list, it gives me an error but works with a tuple.在下面,当我尝试对列表进行哈希处理时,它给了我一个错误,但可以使用元组。 Guess it has something to do with immutability.猜测它与不变性有关。 Can someone explain this in detail ?有人可以详细解释一下吗?

List列表

 x = [1,2,3]
 y = {x: 9}
  Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
 TypeError: unhashable type: 'list'

Tuple元组

z = (5,6)
y = {z: 89}
print(y)
{(5, 6): 89}

Dicts and other objects use hashes to store and retrieve items really quickly.字典和其他对象使用哈希来非常快速地存储和检索项目。 The mechanics of this all happens "under the covers" - you as the programmer don't need to do anything and Python handles it all internally.这一切的机制都发生在“幕后”——你作为程序员不需要做任何事情,Python 在内部处理这一切。 The basic idea is that when you create a dictionary with {key: value} , Python needs to be able to hash whatever you used for key so it can store and look up the value quickly.基本思想是,当您使用{key: value}创建字典时,Python 需要能够散列您用于key任何内容,以便它可以快速存储和查找值。

Immutable objects, or objects that can't be altered, are hashable.不可变对象或无法更改的对象是可散列的。 They have a single unique value that never changes, so python can "hash" that value and use it to look up dictionary values efficiently.它们有一个永远不会改变的唯一值,因此 python 可以“散列”该值并使用它来有效地查找字典值。 Objects that fall into this category include strings, tuples, integers and so on.属于这一类的对象包括字符串、元组、整数等。 You may think, "But I can change a string! I just go mystr = mystr + 'foo' ," but in fact what this does is create a new string instance and assigns it to mystr .你可能会想,“但我可以改变一个字符串!我只是去mystr = mystr + 'foo' ,”但实际上这样做是创建一个新的字符串实例并将它分配给mystr It doesn't modify the existing instance.它不会修改现有实例。 Immutable objects never change, so you can always be sure that when you generate a hash for an immutable object, looking up the object by its hash will always return the same object you started with, and not a modified version.不可变对象永远不会改变,因此您始终可以确保为不可变对象生成哈希时,通过哈希查找对象将始终返回与您开始时相同的对象,而不是修改后的版本。

You can try this for yourself: hash("mystring") , hash(('foo', 'bar')) , hash(1)你可以自己试试: hash("mystring") , hash(('foo', 'bar')) , hash(1)

Mutable objects, or objects that can be modified, aren't hashable.可变对象或可以修改的对象不可散列。 A list can be modified in-place: mylist.append('bar') or mylist.pop(0) .可以就地修改列表: mylist.append('bar')mylist.pop(0) You can't safely hash a mutable object because you can't guarantee that the object hasn't changed since you last saw it.您不能安全地散列可变对象,因为您不能保证该对象自上次看到它以来没有改变。 You'll find that list , set , and other mutable types don't have a __hash__() method.您会发现listset和其他可变类型没有__hash__()方法。 Because of this, you can't use mutable objects as dictionary keys:因此,您不能使用可变对象作为字典键:

>>> hash([1,2,3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Eric Duminil 's answer provides a great example of the unexpected behaviour that arises from using mutable objects as dictionary keys Eric Duminil的回答提供了一个很好的例子,说明使用可变对象作为字典键时出现的意外行为

Here are examples why it might not be a good idea to allow mutable types as keys.以下是为什么允许可变类型作为键可能不是一个好主意的示例。 This behaviour might be useful in some cases (eg using the state of the object as a key rather than the object itself ) but it also might lead to suprising results or bugs.这种行为在某些情况下可能很有用(例如,使用对象的状态作为键而不是对象本身),但它也可能导致令人惊讶的结果或错误。

Python Python

It's possible to use a numeric list as a key by defining __hash__ on a subclass of list :通过在list的子类上定义__hash__ ,可以使用数字列表作为键:

class MyList(list):
    def __hash__(self):
        return sum(self)

my_list = MyList([1, 2, 3])

my_dict = {my_list: 'a'}

print(my_dict.get(my_list))
# a

my_list[2] = 4  # __hash__() becomes 7
print(next(iter(my_dict)))
# [1, 2, 4]
print(my_dict.get(my_list))
# None
print(my_dict.get(MyList([1,2,3])))
# None

my_list[0] = 0  # __hash_() is 6 again, but for different elements
print(next(iter(my_dict)))
# [0, 2, 4]
print(my_dict.get(my_list))
# 'a'

Ruby红宝石

In Ruby, it's allowed to use a list as a key.在 Ruby 中,允许使用列表作为键。 A Ruby list is called an Array and a dict is a Hash , but the syntax is very similar to Python's : Ruby 列表称为Array而 dict 称为Hash ,但语法与 Python 的 非常相似:

my_list = [1]
my_hash = { my_list => 'a'}
puts my_hash[my_list]
#=> 'a'

But if this list is modified, the dict doesn't find the corresponding value any more, even if the key is still in the dict :但是如果修改了这个列表,dict 就再也找不到对应的值了,即使键还在 dict 中:

my_list << 2

puts my_list
#=> [1,2]

puts my_hash.keys.first
#=> [1,2]

puts my_hash[my_list]
#=> nil

It's possible to force the dict to calculate the key hashes again :可以强制 dict 再次计算密钥哈希:

my_hash.rehash
puts my_hash[my_list]
#=> 'a'

A hashset calculates the hash of an object and based on that hash, stores the object in the structure for fast lookup.散列集计算对象的散列,并基于该散列将对象存储在结构中以进行快速查找。 As a result, by contract once an object is added to the dictionary, the hash is not allowed to change .因此,根据合同,一旦将对象添加到字典中,就不允许更改哈希值 Most good hash functions will depend on the number of elements and the elements itself.大多数好的散列函数将取决于元素的数量和元素本身。

A tuple is immutable , so after construction, the values cannot change and therefore the hash cannot change either (or at least a good implementation should not let the hash change).元组是不可变的,所以在构造之后,值不能改变,因此哈希也不能改变(或者至少一个好的实现不应该让哈希改变)。

A list on the other hand is mutable : one can later add/remove/alter elements.另一方面,列表是可变的:稍后可以添加/删除/更改元素。 As a result the hash can change violating the contract.因此,散列可能会违反合约而改变。

So all objects that cannot guarantee a hash function that remains stable after the object is added, violate the contract and thus are no good candidates.因此,所有不能保证在添加对象后哈希函数保持稳定的对象都违反了合约,因此不是好的候选对象。 Because for a lookup , the dictionary will first calculate the hash of the key, and determine the correct bucket.因为对于lookup ,字典会先计算key的hash,然后确定正确的bucket。 If the key is meanwhile changed, this could result in false negatives: the object is in the dictionary, but it can no longer be retrieved because the hash is different so a different bucket will be searched than the one where the object was originally added to.如果同时更改了键,这可能会导致误报:对象在字典中,但无法再检索,因为散列不同,因此将搜索与最初添加对象的桶不同的桶.

I would like to add the following aspect as it's not covered by other answers already.我想添加以下方面,因为它尚未包含在其他答案中。

There's nothing wrong about making mutable objects hashable, it's just not unambiguous and this is why it needs to be defined and implemented consistently by the programmer himself (not by the programming language).使可变对象可散列并没有错,它不是明确的,这就是为什么它需要由程序员自己(而不是由编程语言)一致地定义和实现。

Note that you can implement the __hash__ method for any custom class which allows its instances to be stored in contexts where hashable types are required (such as dict keys or sets).请注意,您可以为任何自定义类实现__hash__方法,该方法允许将其实例存储在需要可哈希类型的上下文中(例如 dict 键或集合)。

Hash values are usually used to decide if two objects represent the same thing.哈希值通常用于决定两个对象是否代表同一事物。 So consider the following example.因此,请考虑以下示例。 You have a list with two items: l = [1, 2] .您有一个包含两项的列表: l = [1, 2] Now you add an item to the list: l.append(3) .现在您向列表中添加一个项目: l.append(3) And now you must answer the following question: Is it still the same thing?现在您必须回答以下问题:它仍然是同一件事吗? Both - yes and no - are valid answers.两者 - 是和否 - 都是有效的答案。 "Yes", it is still the same list and "no", it has not the same content anymore. “是”,它仍然是相同的列表,“否”,它不再是相同的内容。

So the answer to this question depends on you as the programmer and so it is up to you to manually implement hash methods for your mutable types.所以这个问题的答案取决于你作为程序员,所以你可以为你的可变类型手动实现散列方法。

Based on Python Glossary基于Python 词汇表

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method).如果一个对象的哈希值在其生命周期内永远不会改变(它需要一个 __hash__() 方法),并且可以与其他对象进行比较(它需要一个 __eq__() 方法),那么它就是可哈希的。 Hashable objects which compare equal must have the same hash value.比较相等的可散列对象必须具有相同的散列值。

All of Python's immutable built-in objects are hashable; Python 的所有不可变内置对象都是可散列的; mutable containers (such as lists or dictionaries) are not.可变容器(例如列表或字典)不是。

Because a list is mutable, while a tuple is not.因为列表是可变的,而元组则不是。 When you store the hash of a value in, for example, a dict, if the object changes, the stored hash value won't find out, so it will remain the same.例如,当您将值的哈希存储在字典中时,如果对象发生更改,则存储的哈希值将无法找到,因此它将保持不变。 The next time you look up the object, the dictionary will try to look it up by the old hash value, which is not relevant anymore.下次查找对象时,字典将尝试通过不再相关的旧哈希值来查找它。

To prevent that, python does not allow you to has mutable items.为了防止这种情况,python 不允许您拥有可变项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM