简体   繁体   中英

Python 3.x: How to compare two lists containing dictionaries where order doesn't matter

I have nested dictionaries that may contain other dictionaries or lists. I need to be able to compare a list (or set, really) of these dictionaries to show that they are equal.

The order of the list is not uniform. Typically, I would turn the list into a set, but it is not possible since there are values that are also dictionaries.

a = {'color': 'red'}
b = {'shape': 'triangle'}
c = {'children': [{'color': 'red'}, {'age': 8},]}

test_a = [a, b, c] 
test_b = [b, c, a]

print(test_a == test_b)  # False
print(set(test_a) == set(test_b))  # TypeError: unhashable type: 'dict'

Is there a good way to approach this to show that test_a has the same contents as test_b ?

You can use a simple loop to check if each of one list is in the other:

def areEqual(a, b):
    if len(a) != len(b):
        return False

    for d in a:
        if d not in b:
            return False
    return True

I suggest writing a function that turns any Python object into something orderable, with its contents, if it has any, in sorted order. If we call it canonicalize , we can compare nested objects with:

canonicalize(test_a) == canonicalize(test_b)

Here's my attempt at writing a canonicalize function:

def canonicalize(x):
    if isinstance(x, dict):
        x = sorted((canonicalize(k), canonicalize(v)) for k, v in x.items())
    elif isinstance(x, collections.abc.Iterable) and not isinstance(x, str):
        x = sorted(map(canonicalize, x))
    else:
        try:
            bool(x < x) # test for unorderable types like complex
        except TypeError:
            x = repr(x) # replace with something orderable
    return x

This should work for most Python objects. It won't work for lists of heterogeneous items, containers that contain themselves (which will cause the function to hit the recursion limit), nor float('nan') (which has bizarre comparison behavior, and so may mess up the sorting of any container it's in).

It's possible that this code will do the wrong thing for non-iterable, unorderable objects, if they don't have a repr function that describes all the data that makes up their value (eg what is tested by == ). I picked repr as it will work on any kind of object and might get it right (it works for complex , for example). It should also work as desired for classes that have a repr that looks like a constructor call. For classes that have inherited object.__repr__ and so have repr output like <Foo object at 0xXXXXXXXX> it at least won't crash, though the objects will be compared by identity rather than value. I don't think there's any truly universal solution, and you can add some special cases for classes you expect to find in your data if they don't work with repr .

In this case they are the same dicts so you can compare ids ( docs ). Note that if you introduced a new dict whose values were identical it would still be treated differently. Ie d = {'color': 'red'} would be treated as not equal to a .

sorted(map(id, test_a)) == sorted(map(id, test_b))

As @jsbueno points out, you can do this with the kwarg key .

sorted(test_a, key=id) == sorted(test_b, key=id)

If the elements in both lists are shallow, the idea of sorting them, and then comparing with equality can work. The problem with @Alex's solution is that he is only using "id" - but if instead of id, one uses a function that will sort dictionaries properly, things shuld just work:

def sortkey(element):
   if isinstance(element, dict):
         element = sorted(element.items())
   return repr(element)

sorted(test_a, key=sortkey) == sorted(test_b, key=sotrkey) 

(I use an repr to wrap the key because it will cast all elements to string before comparison, which will avoid typerror if different elements are of unorderable types - which would almost certainly happen if you are using Python 3.x)

Just to be clear, if your dictionaries and lists have nested dictionaries themselves, you should use the answer by @m_callens. If your inner lists are also unorderd, you can fix this to work, jsut sorting them inside the key function as well.

An elegant and relatively fast solution:

class QuasiUnorderedList(list):
    def __eq__(self, other):
        """This method isn't as ineffiecient as you think! It runs in O(1 + 2 + 3 + ... + n) time, 
        possibly better than recursively freezing/checking all the elements."""
        for item in self:
            for otheritem in other:
                if otheritem == item:
                    break
            else:
                # no break was reached, item not found.
                return False
        return True

This runs in O(1 + 2 + 3 + ... + n) flat. While slow for dictionaries of low depth, this is faster for dictionaries of high depth.

Here's a considerably longer snippet which is faster for dictionaries where depth is low and length is high.

class FrozenDict(collections.Mapping, collections.Hashable):  # collections.Hashable = portability
    """Adapated from http://stackoverflow.com/a/2704866/1459669"""

    def __init__(self, *args, **kwargs):
        self._d = dict(*args, **kwargs)
        self._hash = None

    def __iter__(self):
        return iter(self._d)

    def __len__(self):
        return len(self._d)

    def __getitem__(self, key):
        return self._d[key]

    def __hash__(self):
        # It would have been simpler and maybe more obvious to
        # use hash(tuple(sorted(self._d.iteritems()))) from this discussion
        # so far, but this solution is O(n). I don't know what kind of
        # n we are going to run into, but sometimes it's hard to resist the
        # urge to optimize when it will gain improved algorithmic performance.
        # Now thread safe by CrazyPython
        if self._hash is None:
            _hash = 0
            for pair in self.iteritems():
                _hash ^= hash(pair)
        self._hash = _hash
        return _hash


def freeze(obj):
    if type(obj) in (str, int, ...):  # other immutable atoms you store in your data structure
        return obj
    elif issubclass(type(obj), list):  # ugly but needed
        return set(freeze(item) for item in obj)
    elif issubclass(type(obj), dict):  # for defaultdict, etc.
        return FrozenDict({key: freeze(value) for key, value in obj.items()})
    else:
        raise NotImplementedError("freeze() doesn't know how to freeze " + type(obj).__name__ + " objects!")


class FreezableList(list, collections.Hashable):
    _stored_freeze = None
    _hashed_self = None

    def __eq__(self, other):
        if self._stored_freeze and (self._hashed_self == self):
            frozen = self._stored_freeze
        else:
            frozen = freeze(self)
        if frozen is not self._stored_freeze:
            self._stored_hash = frozen
        return frozen == freeze(other)

    def __hash__(self):
        if self._stored_freeze and (self._hashed_self == self):
            frozen = self._stored_freeze
        else:
            frozen = freeze(self)
        if frozen is not self._stored_freeze:
            self._stored_hash = frozen
        return hash(frozen)


class UncachedFreezableList(list, collections.Hashable):
    def __eq__(self, other):
        """No caching version of __eq__. May be faster.
        Don't forget to get rid of the declarations at the top of the class!
        Considerably more elegant."""
        return freeze(self) == freeze(other)

    def __hash__(self):
        """No caching version of __hash__. See the notes in the docstring of __eq__2"""
        return hash(freeze(self))

Test all three ( QuasiUnorderedList , FreezableList , and UncachedFreezableList ) and see which one is faster in your situation. I'll betcha it's faster than the other solutions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM