简体   繁体   English

Python 3.x:如何比较包含字典的两个列表,其中顺序无关紧要

[英]Python 3.x: How to compare two lists containing dictionaries where order doesn't matter

I have nested dictionaries that may contain other dictionaries or lists. 我有嵌套的字典,可能包含其他字典或列表。 I need to be able to compare a list (or set, really) of these dictionaries to show that they are equal. 我需要能够比较这些词典的列表(或确定),以表明它们是平等的。

The order of the list is not uniform. 列表的顺序不统一。 Typically, I would turn the list into a set, but it is not possible since there are values that are also dictionaries. 通常,我会将列表转换为集合,但由于存在也是字典的值,因此不可能。

a = {'color': 'red'}
b = {'shape': 'triangle'}
c = {'children': [{'color': 'red'}, {'age': 8},]}

test_a = [a, b, c] 
test_b = [b, c, a]

print(test_a == test_b)  # False
print(set(test_a) == set(test_b))  # TypeError: unhashable type: 'dict'

Is there a good way to approach this to show that test_a has the same contents as test_b ? 有接近这表明一个好办法test_a具有相同内容test_b

You can use a simple loop to check if each of one list is in the other: 您可以使用一个简单的循环来检查一个列表中的每个列表是否在另一个列表中:

def areEqual(a, b):
    if len(a) != len(b):
        return False

    for d in a:
        if d not in b:
            return False
    return True

I suggest writing a function that turns any Python object into something orderable, with its contents, if it has any, in sorted order. 我建议编写一个函数,将任何Python对象转换为可订购的东西,其内容(如果有的话)按排序顺序排列。 If we call it canonicalize , we can compare nested objects with: 如果我们将其称为canonicalize ,我们可以将嵌套对象与以下内容进行比较:

canonicalize(test_a) == canonicalize(test_b)

Here's my attempt at writing a canonicalize function: 这是我尝试编写canonicalize函数:

def canonicalize(x):
    if isinstance(x, dict):
        x = sorted((canonicalize(k), canonicalize(v)) for k, v in x.items())
    elif isinstance(x, collections.abc.Iterable) and not isinstance(x, str):
        x = sorted(map(canonicalize, x))
    else:
        try:
            bool(x < x) # test for unorderable types like complex
        except TypeError:
            x = repr(x) # replace with something orderable
    return x

This should work for most Python objects. 这适用于大多数Python对象。 It won't work for lists of heterogeneous items, containers that contain themselves (which will cause the function to hit the recursion limit), nor float('nan') (which has bizarre comparison behavior, and so may mess up the sorting of any container it's in). 它不适用于异构项的列表,包含自身的容器(这将导致函数达到递归限制),也不会float('nan') (具有奇怪的比较行为,因此可能会破坏排序它所在的任何容器)。

It's possible that this code will do the wrong thing for non-iterable, unorderable objects, if they don't have a repr function that describes all the data that makes up their value (eg what is tested by == ). 如果这些代码没有描述构成其值的所有数据的repr函数(例如,由==测试的内容),则此代码可能会对不可迭代的不可共享对象执行错误操作。 I picked repr as it will work on any kind of object and might get it right (it works for complex , for example). 我选择了repr因为它可以在任何类型的对象上工作,并且可能正确(例如,它适用于complex )。 It should also work as desired for classes that have a repr that looks like a constructor call. 对于具有看起来像构造函数调用的repr类,它也应该可以正常工作。 For classes that have inherited object.__repr__ and so have repr output like <Foo object at 0xXXXXXXXX> it at least won't crash, though the objects will be compared by identity rather than value. 对于具有继承object.__repr__object.__repr__并且具有类似<Foo object at 0xXXXXXXXX> repr输出<Foo object at 0xXXXXXXXX>它至少不会崩溃,尽管对象将通过标识而不是值进行比较。 I don't think there's any truly universal solution, and you can add some special cases for classes you expect to find in your data if they don't work with repr . 我认为没有任何真正的通用解决方案,并且如果它们不与repr工作,您可以为数据中添加一些特殊情况。

In this case they are the same dicts so you can compare ids ( docs ). 在这种情况下,它们是相同的dicts,因此您可以比较ID( docs )。 Note that if you introduced a new dict whose values were identical it would still be treated differently. 请注意,如果您引入了一个值相同的新dict ,它仍然会被区别对待。 Ie d = {'color': 'red'} would be treated as not equal to a . d = {'color': 'red'}将被视为不等于a

sorted(map(id, test_a)) == sorted(map(id, test_b))

As @jsbueno points out, you can do this with the kwarg key . 正如@jsbueno指出的那样,你可以用kwarg key来做到这一点。

sorted(test_a, key=id) == sorted(test_b, key=id)

If the elements in both lists are shallow, the idea of sorting them, and then comparing with equality can work. 如果两个列表中的元素都很浅,那么对它们进行排序,然后与相等进行比较的想法就可以了。 The problem with @Alex's solution is that he is only using "id" - but if instead of id, one uses a function that will sort dictionaries properly, things shuld just work: @Alex的解决方案的问题在于他只使用“id” - 但是如果不使用id,那么使用一个能够正确排序字典的函数,事情就会起作用:

def sortkey(element):
   if isinstance(element, dict):
         element = sorted(element.items())
   return repr(element)

sorted(test_a, key=sortkey) == sorted(test_b, key=sotrkey) 

(I use an repr to wrap the key because it will cast all elements to string before comparison, which will avoid typerror if different elements are of unorderable types - which would almost certainly happen if you are using Python 3.x) (我使用repr来包装密钥,因为它会在比较之前将所有元素转换为字符串,如果不同的元素是不可共享的类型,这将避免typerror - 如果使用Python 3.x几乎肯定会发生这种情况)

Just to be clear, if your dictionaries and lists have nested dictionaries themselves, you should use the answer by @m_callens. 需要明确的是,如果你的词典和列表本身都嵌套了词典,你应该使用@m_callens的答案。 If your inner lists are also unorderd, you can fix this to work, jsut sorting them inside the key function as well. 如果你的内部列表也是无序的,你可以解决这个问题,jsut也可以在key函数中对它们进行排序。

An elegant and relatively fast solution: 优雅且相对快速的解决方案:

class QuasiUnorderedList(list):
    def __eq__(self, other):
        """This method isn't as ineffiecient as you think! It runs in O(1 + 2 + 3 + ... + n) time, 
        possibly better than recursively freezing/checking all the elements."""
        for item in self:
            for otheritem in other:
                if otheritem == item:
                    break
            else:
                # no break was reached, item not found.
                return False
        return True

This runs in O(1 + 2 + 3 + ... + n) flat. 这在O(1 + 2 + 3 + ... + n)平面上运行。 While slow for dictionaries of low depth, this is faster for dictionaries of high depth. 虽然低深度词典的速度很慢,但对于高深度的词典来说速度更快。

Here's a considerably longer snippet which is faster for dictionaries where depth is low and length is high. 这是一个相当长的片段,对于深度较低且长度较长的词典来说速度更快。

class FrozenDict(collections.Mapping, collections.Hashable):  # collections.Hashable = portability
    """Adapated from http://stackoverflow.com/a/2704866/1459669"""

    def __init__(self, *args, **kwargs):
        self._d = dict(*args, **kwargs)
        self._hash = None

    def __iter__(self):
        return iter(self._d)

    def __len__(self):
        return len(self._d)

    def __getitem__(self, key):
        return self._d[key]

    def __hash__(self):
        # It would have been simpler and maybe more obvious to
        # use hash(tuple(sorted(self._d.iteritems()))) from this discussion
        # so far, but this solution is O(n). I don't know what kind of
        # n we are going to run into, but sometimes it's hard to resist the
        # urge to optimize when it will gain improved algorithmic performance.
        # Now thread safe by CrazyPython
        if self._hash is None:
            _hash = 0
            for pair in self.iteritems():
                _hash ^= hash(pair)
        self._hash = _hash
        return _hash


def freeze(obj):
    if type(obj) in (str, int, ...):  # other immutable atoms you store in your data structure
        return obj
    elif issubclass(type(obj), list):  # ugly but needed
        return set(freeze(item) for item in obj)
    elif issubclass(type(obj), dict):  # for defaultdict, etc.
        return FrozenDict({key: freeze(value) for key, value in obj.items()})
    else:
        raise NotImplementedError("freeze() doesn't know how to freeze " + type(obj).__name__ + " objects!")


class FreezableList(list, collections.Hashable):
    _stored_freeze = None
    _hashed_self = None

    def __eq__(self, other):
        if self._stored_freeze and (self._hashed_self == self):
            frozen = self._stored_freeze
        else:
            frozen = freeze(self)
        if frozen is not self._stored_freeze:
            self._stored_hash = frozen
        return frozen == freeze(other)

    def __hash__(self):
        if self._stored_freeze and (self._hashed_self == self):
            frozen = self._stored_freeze
        else:
            frozen = freeze(self)
        if frozen is not self._stored_freeze:
            self._stored_hash = frozen
        return hash(frozen)


class UncachedFreezableList(list, collections.Hashable):
    def __eq__(self, other):
        """No caching version of __eq__. May be faster.
        Don't forget to get rid of the declarations at the top of the class!
        Considerably more elegant."""
        return freeze(self) == freeze(other)

    def __hash__(self):
        """No caching version of __hash__. See the notes in the docstring of __eq__2"""
        return hash(freeze(self))

Test all three ( QuasiUnorderedList , FreezableList , and UncachedFreezableList ) and see which one is faster in your situation. 测试三个( QuasiUnorderedListFreezableListUncachedFreezableList ),看看哪一个是您的情况更快。 I'll betcha it's faster than the other solutions. 我敢打赌它比其他解决方案更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM