简体   繁体   中英

Unique dictionaries out of a list of lists?

I have a list called matrix which contains some rows. Each row contains some dictionaries, and each dictionary could be contained in more than one row.

I want to generate a list called dictionaries which contains all the dictionaries in the matrix, but without duplicates . I already have a solution, but I would like to use comprehension.

row1 = [{'NODE':1}, {'NODE':2}, {'NODE':3}]
row2 = [{'NODE':3}, {'NODE':4}, {'NODE':5}]
row3 = [{'NODE':4}, {'NODE':6}, {'NODE':7}]
matrix = [row1, row2, row3]

dictionaries = []
for row in matrix:
    for dictionary in row:
        items.append(dictionary) if dictionary not in dictionaries else None

print dictionaries
[{'NODE':1}, {'NODE':2}, {'NODE':3}, {'NODE':4}, {'NODE':5}, {'NODE':6}, {'NODE':7}]

I would like something like the following but it doesn't work since I cannot ask to check a list while I'm creating it:

dictionaries = [dictionary for row in matrix for dictionary in row if dictionary not in dictionaries]

The dictionary keys and values are primitive immutable objects like strings and integers.

You could use a list comprehension, but depending on your Python version, using an collections.OrderedDict object with a generator expression to flatten the matrix would actually be more efficient.

When your values are not hashable and thus can't be stored in a set or dictionary, you'll have to use first create an immutable representation, so we can store that representation in a set or dictionary to efficiently track uniqueness.

For dictionaries that are flat structures with all keys and values immutable, just use tuple(sorted(d.items())) . This produces a tuple of all (key, value) pairs (also tuples), in sorted order to avoid dictionary order issues.

On Python 3.5 and up, use an OrderedDict() that maps the immutable keys to original dictionaries:

from collections import OrderedDict

key = lambda d: tuple(sorted(d.items()))

dictionaries = list(OrderedDict((key(v), v) for row in matrix for v in row).values())

On Python 3.4 and earlier, OrderedDict is slow and you'd be beter of using a separate set approach for Python 3.4 and below:

key = lambda d: tuple(sorted(d.items()))
seen = set()
seen_add = seen.add
dictionaries = [
    v for row in matrix
    for k, v in ((key(v), v) for v in row)
    if not (k in seen or seen_add(k))]

Quick demo using your input data and an OrderedDict :

>>> from collections import OrderedDict
>>> row1 = [{'NODE':1}, {'NODE':2}, {'NODE':3}]
>>> row2 = [{'NODE':3}, {'NODE':4}, {'NODE':5}]
>>> row3 = [{'NODE':4}, {'NODE':6}, {'NODE':7}]
>>> matrix = [row1, row2, row3]
>>> key = lambda d: tuple(sorted(d.items()))
>>> list(OrderedDict((key(v), v) for row in matrix for v in row).values())
[{'NODE': 1}, {'NODE': 2}, {'NODE': 3}, {'NODE': 4}, {'NODE': 5}, {'NODE': 6}, {'NODE': 7}]

如果您有NumPy:

np.unique(matrix).tolist()

整理列表,然后使用一组消除重复。

print set(item for sublist in matrix for item in sublist)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM