简体   繁体   English

`list in list`的行为与`dict中的`不同?

[英]`object in list` behaves different from `object in dict`?

I've got an iterator with some objects in it and I wanted to create a collection of uniqueUsers in which I only list every user once. 我有一个带有一些对象的迭代器,我想创建一个uniqueUsers的集合,其中我只列出每个用户一次。 So playing around a bit I tried it with both a list and a dict: 所以玩了一下我用列表和字典尝试了它:

>>> for m in ms: print m.to_user  # let's first look what's inside ms
...
Pete Kramer
Pete Kramer
Pete Kramer
>>> 
>>> uniqueUsers = []  # Create an empty list
>>> for m in ms:
...     if m.to_user not in uniqueUsers:
...         uniqueUsers.append(m.to_user)
...
>>> uniqueUsers
[Pete Kramer]  # This is what I would expect
>>> 
>>> uniqueUsers = {}  # Now let's create a dict
>>> for m in ms:
...     if m.to_user not in uniqueUsers:
...         uniqueUsers[m.to_user] = 1
...
>>> uniqueUsers
{Pete Kramer: 1, Pete Kramer: 1, Pete Kramer: 1}

So I tested it by converting the dict to a list when doing the if statement, and that works as I would expect it to: 所以我通过在执行if语句时将dict转换为列表来测试它,并且这可以像我期望的那样工作:

>>> uniqueUsers = {}
>>> for m in ms:
...     if m.to_user not in list(uniqueUsers):
...         uniqueUsers[m.to_user] = 1
...
>>> uniqueUsers
{Pete Kramer: 1}

and I can get a similar result by testing against uniqueUsers.keys() . 我可以通过针对uniqueUsers.keys()进行测试来获得类似的结果。

The thing is that I don't understand why this difference occurs. 问题是我不明白为什么会出现这种差异。 I always thought that if you do if object in dict , it simply creates a list of the dicts keys and tests agains that, but that's obviously not the case. 我一直认为如果你if object in dictif object in dict ,它只是创建一个dicts键列表并再次测试,但事实上并非如此。

Can anybody explain how object in dict internally works and why it doesn't behave similar to object in list (as I would expect it to)? 任何人都可以解释object in dict内部的object in dict如何工作的以及为什么它的行为与object in list不相似(正如我所期望的那样)?

In order to understand what's going on, you have to understand how the in operator, the membership test , behaves for the different types. 为了理解发生了什么,您必须了解in运算符( 成员资格测试 )对不同类型的行为。

For lists, this is pretty simple due to what lists fundamentally are: Ordered arrays that do not care about duplicates. 对于列表,这很简单,因为基本上是什么列表:不关心重复的有序数组。 The only possible way to peform a membership test here is to iterate over the list and check every item on equality . 这里执行成员资格测试的唯一可能方法是遍历列表并检查每个项目是否相等 Something like this: 像这样的东西:

# x in lst
for item in lst:
    if x == item:
        return True
return False

Dictionaries are a bit different: They are hash tables were keys are meant to be unique. 字典有点不同:它们是哈希表,键是唯一的。 Hash tables require the keys to be hashable which essentially means that there needs to be an explicit function that converts the object into an integer. 散列表要求密钥是可散列的 ,这实际上意味着需要有一个显式函数将对象转换为整数。 This hash value is then used to put the key/value mapping somewhere into the hash table. 然后使用此哈希值将键/值映射放在哈希表中的某处。

Since the hash value determines where in the hash table an item is placed, it's critical that objects which are meant to be identical produce the same hash value. 由于散列值确定了散列表中项目的放置位置,因此,要求相同的对象生成相同的散列值至关重要。 So the following implication has to be true: x == y => hash(x) == hash(y) . 所以下面的含义必须是正确的: x == y => hash(x) == hash(y) The reverse does not need to be true though; 反过来不一定是真的; it's perfectly valid to have different objects produce the same hash value. 让不同的对象产生相同的哈希值是完全有效的。

When a membership test on a dictionary is performed, then the dictionary will first look for the hash value. 当对字典执行成员资格测试时,字典将首先查找散列值。 If it can find it, then it will perform an equality check on all items it found; 如果它可以找到它,那么它将对它找到的所有项目执行相等检查; if it didn't find the hash value, then it assumes that it's a different object: 如果它没有找到哈希值,那么它假定它是一个不同的对象:

# x in dct
h = hash(x)
items = getItemsForHash(dct, h)
for item in items:
    if x == item:
        return True
# items is empty, or no match inside the loop
return False

Since you get the desired result when using a membership test against a list, that means that your object implements the equality comparison ( __eq__ ) correctly. 由于在对列表使用成员资格测试时获得了所需的结果,这意味着您的对象正确地实现了相等比较( __eq__ )。 But since you do not get the correct result when using a dictionary, there seems to be a __hash__ implementation that is out of sync with the equality comparison implementation: 但是由于在使用字典时没有得到正确的结果,似乎有一个__hash__实现与等式比较实现不同步:

>>> class SomeType:
        def __init__ (self, x):
            self.x = x
        def __eq__ (self, other):
            return self.x == other.x
        def __hash__ (self):
            # bad hash implementation
            return hash(id(self))

>>> l = [SomeType(1)]
>>> d = { SomeType(1): 'x' }
>>> x = SomeType(1)
>>> x in l
True
>>> x in d
False

Note that for new-style classes in Python 2 (classes that inherit from object ), this “bad hash implementation” (which is based on the object id) is the default. 请注意,对于Python 2中的新式类(从object继承的类),此“错误哈希实现”(基于对象ID)是默认值。 So when you do not implement your own __hash__ function, it still uses that one. 所以当你没有实现自己的__hash__函数时,它仍然使用那个函数。 This ultimately means that unless your __eq__ only performs an identity check (the default), the hash function will be out of sync. 这最终意味着除非您的__eq__仅执行身份检查(默认),否则哈希函数不同步。

So the solution is to implement __hash__ in a way that it aligns with the rules used in __eq__ . 所以解决方案是以与__eq__使用的规则对齐的方式实现__hash__ For example, if you compare two members self.x and self.y , then you should use a compound hash over those two members. 例如,如果比较两个成员self.xself.y ,那么您应该self.y两个成员使用复合哈希。 The easiest way to do that is to return the hash value of a tuple of those values: 最简单的方法是返回这些值的元组的哈希值:

class SomeType (object):
    def __init__ (self, x, y):
        self.x = x
        self.y = y

    def __eq__ (self, other):
        return self.x == other.x and self.y == other.y

    def __hash__ (self):
        return hash((self.x, self.y))

Note that you should not make an object hashable if it is mutable: 请注意,如果对象是可变的,则不应使其成为可散列的:

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__() , since the implementation of hashable collections requires that a key's hash value is immutable (if the object's hash value changes, it will be in the wrong hash bucket). 如果一个类定义了可变对象并实现了一个__eq__()方法,它就不应该实现__hash__() ,因为__hash__()集合的实现要求一个键的哈希值是不可变的(如果对象的哈希值发生变化,那么它将是错误的哈希桶)。

TL;DR: The in test calls __eq__ for lists. TL; DR: in测试为列表调用__eq__ For dicts, it first calls __hash__ and if the hash matches, then calls __eq__ . 对于dicts,它首先调用__hash__ ,如果哈希匹配,则调用__eq__

  1. The in test only calls __eq__ for lists. in测试仅为列表调用__eq__
    • Without an __eq__ , the in-ness comparison is always False . 没有__eq__内在比较总是为False
  2. For dicts, you need a correctly implemented __hash__ and __eq__ to be able to compare objects in it correctly : 对于dicts,您需要正确实现__hash__ __eq__才能正确比较其中的对象:

    • First gets the object's hash from __hash__ 首先从__hash__获取对象的哈希

      • Without __hash__ , for new-style classes, it uses id() which is unique for all objects created and hence never matches an existing one unless it's the same object. 没有__hash__ ,对于新式类,它使用id() ,它对于创建的所有对象都是唯一的,因此永远不会匹配现有的对象,除非它是同一个对象。
      • And as @poke pointed out in a comment: 正如@poke在评论中指出的那样:

        In Python 2, new style classes (inheriting from object ) inherit object's __hash__ implementation which is based on id() , so that's where that comes from. 在Python 2中,新的样式类(继承自object )继承了基于id()对象的__hash__实现,因此它来自于它。

    • If the hash matches, then __eq__ is called for that object with the other . 如果哈希匹配, __eq__被要求与该对象other

      • The result then depends on what __eq__ returns. 结果取决于__eq__返回的内容。
    • If the hash does not match, then __eq__ is not called . 如果哈希匹配, 则不调用 __eq__

So the in test calls __eq__ for lists and for dicts... but for dicts, only after __hash__ returns a matching hash. 所以in测试调用__eq__表示列表和dicts ... 但是对于__hash__ ,只有在__hash__返回匹配的哈希之后。 And not having a __hash__ doesn't return None , doesn't throw an error and doesn't make it "unhashable". 并且没有__hash__不会返回None ,不会抛出错误并且不会使其“不可用”。 ...in Python 2. To use your to_user class correctly as dict keys, you do need to have a __hash__ method which is implemented correctly, in sync with __eq__ . ...在Python中2.要正确使用to_user类作为dict键,你需要有一个__hash__方法 ,该方法__eq__同步正确实现。

Details: 细节:

The check for m.to_user not in uniqueUsers "object in list" worked correctly because you have probably implemented an __eq__ method, as @poke pointed out. 检查m.to_user not in uniqueUsers “列表中的对象”中工作正常,因为您可能实现了__eq__方法,如@poke指出的那样。 (And it appears to_user returns an object, not a string.) (并且看起来to_user返回一个对象,而不是一个字符串。)

The same check doesn't work for "object in dict" either because: 同样的检查不适用于“dict中的对象”,因为:
(a) __hash__ in that class is badly implemented, as @poke also pointed out. 的(a) __hash__在该类被严重实现,如@poke也指出。
(b) Or you have not implemented __hash__ at all. (b) 或者你根本没有实现__hash__ This doesn't raise an error in Python2 new-style classes. 这不会在Python2新式类中引发错误。

Using the class in this answer as a starting point: 使用本答案中的类作为起点:

>>> class Test2(object):
...     def __init__(self, name):
...         self.name = name
...
...     def __eq__(self, other):
...         return self.name == other.name
...
>>> test_Dict = {}
>>> test_List = []
>>>
>>> obj1 = Test2('a')
>>> obj2 = Test2('a')
>>>
>>> test_Dict[obj1] = 'x'
>>> test_Dict[obj2] = 'y'
>>>
>>> test_List.append(obj1)
>>> test_List.append(obj2)
>>>
>>> test_Dict
{<__main__.Test2 object at 0x0000000002EFC518>: 'x', <__main__.Test2 object at 0x0000000002EFC940>: 'y'}
>>> test_List
[<__main__.Test2 object at 0x0000000002EFC518>, <__main__.Test2 object at 0x0000000002EFC940>]
>>>
>>> Test2('a') in test_Dict
False
>>> Test2('a') in test_List
True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM