[英]`object in list` behaves different from `object in dict`?
I've got an iterator with some objects in it and I wanted to create a collection of uniqueUsers in which I only list every user once. 我有一个带有一些对象的迭代器,我想创建一个uniqueUsers的集合,其中我只列出每个用户一次。 So playing around a bit I tried it with both a list and a dict:
所以玩了一下我用列表和字典尝试了它:
>>> for m in ms: print m.to_user # let's first look what's inside ms
...
Pete Kramer
Pete Kramer
Pete Kramer
>>>
>>> uniqueUsers = [] # Create an empty list
>>> for m in ms:
... if m.to_user not in uniqueUsers:
... uniqueUsers.append(m.to_user)
...
>>> uniqueUsers
[Pete Kramer] # This is what I would expect
>>>
>>> uniqueUsers = {} # Now let's create a dict
>>> for m in ms:
... if m.to_user not in uniqueUsers:
... uniqueUsers[m.to_user] = 1
...
>>> uniqueUsers
{Pete Kramer: 1, Pete Kramer: 1, Pete Kramer: 1}
So I tested it by converting the dict to a list when doing the if statement, and that works as I would expect it to: 所以我通过在执行if语句时将dict转换为列表来测试它,并且这可以像我期望的那样工作:
>>> uniqueUsers = {}
>>> for m in ms:
... if m.to_user not in list(uniqueUsers):
... uniqueUsers[m.to_user] = 1
...
>>> uniqueUsers
{Pete Kramer: 1}
and I can get a similar result by testing against uniqueUsers.keys()
. 我可以通过针对
uniqueUsers.keys()
进行测试来获得类似的结果。
The thing is that I don't understand why this difference occurs. 问题是我不明白为什么会出现这种差异。 I always thought that if you do
if object in dict
, it simply creates a list of the dicts keys and tests agains that, but that's obviously not the case. 我一直认为如果你
if object in dict
做if object in dict
,它只是创建一个dicts键列表并再次测试,但事实上并非如此。
Can anybody explain how object in dict
internally works and why it doesn't behave similar to object in list
(as I would expect it to)? 任何人都可以解释
object in dict
内部的object in dict
如何工作的以及为什么它的行为与object in list
不相似(正如我所期望的那样)?
In order to understand what's going on, you have to understand how the in
operator, the membership test , behaves for the different types. 为了理解发生了什么,您必须了解
in
运算符( 成员资格测试 )对不同类型的行为。
For lists, this is pretty simple due to what lists fundamentally are: Ordered arrays that do not care about duplicates. 对于列表,这很简单,因为基本上是什么列表:不关心重复的有序数组。 The only possible way to peform a membership test here is to iterate over the list and check every item on equality .
这里执行成员资格测试的唯一可能方法是遍历列表并检查每个项目是否相等 。 Something like this:
像这样的东西:
# x in lst
for item in lst:
if x == item:
return True
return False
Dictionaries are a bit different: They are hash tables were keys are meant to be unique. 字典有点不同:它们是哈希表,键是唯一的。 Hash tables require the keys to be hashable which essentially means that there needs to be an explicit function that converts the object into an integer.
散列表要求密钥是可散列的 ,这实际上意味着需要有一个显式函数将对象转换为整数。 This hash value is then used to put the key/value mapping somewhere into the hash table.
然后使用此哈希值将键/值映射放在哈希表中的某处。
Since the hash value determines where in the hash table an item is placed, it's critical that objects which are meant to be identical produce the same hash value. 由于散列值确定了散列表中项目的放置位置,因此,要求相同的对象生成相同的散列值至关重要。 So the following implication has to be true:
x == y => hash(x) == hash(y)
. 所以下面的含义必须是正确的:
x == y => hash(x) == hash(y)
。 The reverse does not need to be true though; 反过来不一定是真的; it's perfectly valid to have different objects produce the same hash value.
让不同的对象产生相同的哈希值是完全有效的。
When a membership test on a dictionary is performed, then the dictionary will first look for the hash value. 当对字典执行成员资格测试时,字典将首先查找散列值。 If it can find it, then it will perform an equality check on all items it found;
如果它可以找到它,那么它将对它找到的所有项目执行相等检查; if it didn't find the hash value, then it assumes that it's a different object:
如果它没有找到哈希值,那么它假定它是一个不同的对象:
# x in dct
h = hash(x)
items = getItemsForHash(dct, h)
for item in items:
if x == item:
return True
# items is empty, or no match inside the loop
return False
Since you get the desired result when using a membership test against a list, that means that your object implements the equality comparison ( __eq__
) correctly. 由于在对列表使用成员资格测试时获得了所需的结果,这意味着您的对象正确地实现了相等比较(
__eq__
)。 But since you do not get the correct result when using a dictionary, there seems to be a __hash__
implementation that is out of sync with the equality comparison implementation: 但是由于在使用字典时没有得到正确的结果,似乎有一个
__hash__
实现与等式比较实现不同步:
>>> class SomeType:
def __init__ (self, x):
self.x = x
def __eq__ (self, other):
return self.x == other.x
def __hash__ (self):
# bad hash implementation
return hash(id(self))
>>> l = [SomeType(1)]
>>> d = { SomeType(1): 'x' }
>>> x = SomeType(1)
>>> x in l
True
>>> x in d
False
Note that for new-style classes in Python 2 (classes that inherit from object
), this “bad hash implementation” (which is based on the object id) is the default. 请注意,对于Python 2中的新式类(从
object
继承的类),此“错误哈希实现”(基于对象ID)是默认值。 So when you do not implement your own __hash__
function, it still uses that one. 所以当你没有实现自己的
__hash__
函数时,它仍然使用那个函数。 This ultimately means that unless your __eq__
only performs an identity check (the default), the hash function will be out of sync. 这最终意味着除非您的
__eq__
仅执行身份检查(默认),否则哈希函数将不同步。
So the solution is to implement __hash__
in a way that it aligns with the rules used in __eq__
. 所以解决方案是以与
__eq__
使用的规则对齐的方式实现__hash__
。 For example, if you compare two members self.x
and self.y
, then you should use a compound hash over those two members. 例如,如果比较两个成员
self.x
和self.y
,那么您应该self.y
两个成员使用复合哈希。 The easiest way to do that is to return the hash value of a tuple of those values: 最简单的方法是返回这些值的元组的哈希值:
class SomeType (object):
def __init__ (self, x, y):
self.x = x
self.y = y
def __eq__ (self, other):
return self.x == other.x and self.y == other.y
def __hash__ (self):
return hash((self.x, self.y))
Note that you should not make an object hashable if it is mutable: 请注意,如果对象是可变的,则不应使其成为可散列的:
If a class defines mutable objects and implements an
__eq__()
method, it should not implement__hash__()
, since the implementation of hashable collections requires that a key's hash value is immutable (if the object's hash value changes, it will be in the wrong hash bucket).如果一个类定义了可变对象并实现了一个
__eq__()
方法,它就不应该实现__hash__()
,因为__hash__()
集合的实现要求一个键的哈希值是不可变的(如果对象的哈希值发生变化,那么它将是错误的哈希桶)。
TL;DR: The in
test calls __eq__
for lists. TL; DR:
in
测试为列表调用__eq__
。 For dicts, it first calls __hash__
and if the hash matches, then calls __eq__
. 对于dicts,它首先调用
__hash__
,如果哈希匹配,则调用__eq__
。
in
test only calls __eq__
for lists. in
测试仅为列表调用__eq__
。
__eq__
, the in-ness comparison is always False
. __eq__
, 内在比较总是为False
。 For dicts, you need a correctly implemented __hash__
and __eq__
to be able to compare objects in it correctly : 对于dicts,您需要正确实现
__hash__
和 __eq__
才能正确比较其中的对象:
First gets the object's hash from __hash__
首先从
__hash__
获取对象的哈希
__hash__
, for new-style classes, it uses id()
which is unique for all objects created and hence never matches an existing one unless it's the same object. __hash__
,对于新式类,它使用id()
,它对于创建的所有对象都是唯一的,因此永远不会匹配现有的对象,除非它是同一个对象。 In Python 2, new style classes (inheriting from
object
) inherit object's__hash__
implementation which is based onid()
, so that's where that comes from.在Python 2中,新的样式类(继承自
object
)继承了基于id()
对象的__hash__
实现,因此它来自于它。
If the hash matches, then __eq__
is called for that object with the other
. 如果哈希匹配, 则
__eq__
被要求与该对象other
。
__eq__
returns. __eq__
返回的内容。 __eq__
is not called . __eq__
。 So the in
test calls __eq__
for lists and for dicts... but for dicts, only after __hash__
returns a matching hash. 所以
in
测试调用__eq__
表示列表和dicts ... 但是对于__hash__
,只有在__hash__
返回匹配的哈希之后。 And not having a __hash__
doesn't return None
, doesn't throw an error and doesn't make it "unhashable". 并且没有
__hash__
不会返回None
,不会抛出错误并且不会使其“不可用”。 ...in Python 2. To use your to_user
class correctly as dict keys, you do need to have a __hash__
method which is implemented correctly, in sync with __eq__
. ...在Python中2.要正确使用
to_user
类作为dict键,你需要有一个__hash__
方法 ,该方法与__eq__
同步正确实现。
Details: 细节:
The check for m.to_user not in uniqueUsers
"object in list" worked correctly because you have probably implemented an __eq__
method, as @poke pointed out. 检查
m.to_user not in uniqueUsers
“列表中的对象”中工作正常,因为您可能实现了__eq__
方法,如@poke指出的那样。 (And it appears to_user
returns an object, not a string.) (并且看起来
to_user
返回一个对象,而不是一个字符串。)
The same check doesn't work for "object in dict" either because: 同样的检查不适用于“dict中的对象”,因为:
(a) __hash__
in that class is badly implemented, as @poke also pointed out. 的(a)
__hash__
在该类被严重实现,如@poke也指出。
(b) Or you have not implemented __hash__
at all. (b) 或者你根本没有实现
__hash__
。 This doesn't raise an error in Python2 new-style classes. 这不会在Python2新式类中引发错误。
Using the class in this answer as a starting point: 使用本答案中的类作为起点:
>>> class Test2(object):
... def __init__(self, name):
... self.name = name
...
... def __eq__(self, other):
... return self.name == other.name
...
>>> test_Dict = {}
>>> test_List = []
>>>
>>> obj1 = Test2('a')
>>> obj2 = Test2('a')
>>>
>>> test_Dict[obj1] = 'x'
>>> test_Dict[obj2] = 'y'
>>>
>>> test_List.append(obj1)
>>> test_List.append(obj2)
>>>
>>> test_Dict
{<__main__.Test2 object at 0x0000000002EFC518>: 'x', <__main__.Test2 object at 0x0000000002EFC940>: 'y'}
>>> test_List
[<__main__.Test2 object at 0x0000000002EFC518>, <__main__.Test2 object at 0x0000000002EFC940>]
>>>
>>> Test2('a') in test_Dict
False
>>> Test2('a') in test_List
True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.