简体   繁体   中英

Turn List of Dictionaries and Into a Set of Dictionaries

I have a list of dictionaries like the following:

 a = [{1000976: 975},
 {1000977: 976},
 {1000978: 977},
 {1000979: 978},
 {1000980: 979},
 {1000981: 980},
 {1000982: 981},
 {1000983: 982},
 {1000984: 983},
 {1000985: 984}]

I could be thinking about this wrong, but I'm comparing this list of dicts to another list of dicts and am attempting to remove elements (dictionaries) in one list that are in the other. In order to list operations, I want to transform both into sets and perform set subtraction. However I'm getting the following error when attempting to do the conversion.

set_a = set(a)

TypeError: unhashable type: 'dict'

Am I thinking about this incorrectly?

Try this:

>>> a = [{1000976: 975},
...  {1000977: 976},
...  {1000978: 977},
...  {1000979: 978},
...  {1000980: 979},
...  {1000981: 980},
...  {1000982: 981},
...  {1000983: 982},
...  {1000984: 983},
...  {1000985: 984}]
>>> a.extend(a)  # just to add some duplicates
>>> len(a)
20
>>> dict_set = set(frozenset(d.items()) for d in a)
>>> b = [dict(s) for s in dict_set]
>>> b
[{1000982: 981}, {1000983: 982}, {1000981: 980}, {1000985: 984}, {1000978: 977}, {1000980: 979}, {1000977: 976}, {1000976: 975}, {1000984: 983}, {1000979: 978}]
>>> len(b)
10

If you want do set subtraction between two lists of dicts then just use the same conversion to sets as above on both dicts, do the subtraction, then convert back.

Note: At the very least all values in your dict should also be hashable (as well as keys but that goes without saying). If not, you need a similar transformation on the values into a hashable, immutable type of some kind.

Note: This is also does not preserve the original order; if that's important to you need to adapt this to an algorithm like this one . The key though is converting dicts to some immutable type.

You could turn the dictionaries into tuples, as there are only two values like so:

a_set = set(t for d in a for t in d.items())

And then use set operations to compare two sets from that point. To convert back into a list of dictionaries, you can use:

a_list = [{key: value} for key, value in a_set]

For filtering there's a one-liner. (b is the filter list of dicts). This is by far and away the fastest approach, unless you are using the same filter against multiple sets.

c = [a[i] for i,j in enumerate(a) if j not in b]

Or using the built in filter: another one-liner (slower):

c = list(filter(lambda i: i not in b, a))

If you are really asking how to convert a list of dicts into a set-operable variable, then you can do this with yet another one-liner:

a_set = set(map(lambda i: frozenset(i.items()), a))

again, if we have 'b' as a list of dicts as our filter

b_set = set(map(lambda i: frozenset(i.items()), b))

... and we can now use set operations on them:

c_set = a_set - b_set

The 'frozenset' method of converting a dict to a set is about 25% faster than using a list comprehension; but it's much slower to convert everything to sets and then perform the set operations than it is just to use a simple list comprehension filter such as the one at the top of my answer. Obviously, if one is going to do many filters, it may be cost effective to convert the objects to immutables; but in that case, it may be better to change the underlying data structure of the objects, and convert the entire structure to a class.

If you don't want to use frozen set and your dicts are arbitrary, rather than single entry dicts, you can tupelise the dicts:

a_set = set(map(lambda j: tuple(map(lambda i: tuple((i, j[i])), j)), a))

You suggest in the question that you don't want ANY nested loop, and so far all the answers (including mine) have a 'for' (or a lambda).

When we want to use a set method for filtering two dictionaries, it's not too shabby to do exactly that as follows:

c = a.items() - b.items()

of course if we want c to be a dict, we need to wrap it again:

c = dict(a.items() - b.items()

Likewise, for lists of immutable types, we can do the same (by coercing our lists into sets:

x = [3, 4, 5, 6, 7]
y = [3, 2, 1, 7]
z = set(x) - set(y)

or (tuples are immutable)

x = [(3, 1), (4, 1), (5, 1), (6, 2), (7, 5)]
y = [(4, 1), (4, 2), (5, 1)]
z = set(x) - set(y)

but (mutable) lists fail (as do your dicts):

x = [[3, 1], [4, 1], [5, 1], [6, 2], [7, 5]]
y = [[4, 1], [4, 2], [5, 1]]
z = set(x) - set(y)

>>>> TypeError: unhashable type: 'list'

This is because they are being stored by reference, not by value - so the uniqueness of them is unknowable at that point. One can handle it by creating a class - but then that is not using a list of dicts anymore, and your 'for' is just being buried into a class method.

So - you will need a nested loop somewhere, even if it is hidden by a lambda or a function..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM