简体   繁体   中英

Remove duplicates from a list of a list of unordered dictionaries

Consider the following:

[
  [
    {'name': 'fred', 'score': 19},
    {'name': 'frank', 'score': 100},
    {'name': 'bob', 'score': 99}
  ],
  [
    {'name': 'frank', 'score': 100},
    {'name': 'fred', 'score': 19},
    {'name': 'bob', 'score': 99}
  ],
  [
    {'name': 'bob', 'score': 99},
    {'name': 'frank', 'score': 100},
    {'name': 'fred', 'score': 19}
  ],
  [
    {'name': 'fred', 'score': 19},
    {'name': 'frank', 'score': 100},
    {'name': 'stu', 'score': 69}
  ]
]

Ignoring the order of the dictionaries within each list, how can duplicates be removed such that the output would be only two of the lists: one with bob and one with stu?

Output something like:

[
  [
    {'name': 'fred', 'score': 19},
    {'name': 'frank', 'score': 100},
    {'name': 'bob', 'score': 99}
  ],
  [
    {'name': 'fred', 'score': 19},
    {'name': 'frank', 'score': 100},
    {'name': 'stu', 'score': 69}
  ]
]

You could try something like this

dict_list =   [[{'name': 'fred', 'score': 19},
  {'name': 'frank', 'score': 100},
  {'name': 'bob', 'score': 99}],
 [{'name': 'frank', 'score': 100},
  {'name': 'fred', 'score': 19},
  {'name': 'bob', 'score': 99}],
 [{'name': 'bob', 'score': 99},
  {'name': 'frank', 'score': 100},
  {'name': 'fred', 'score': 19}],
 [{'name': 'fred', 'score': 19},
  {'name': 'frank', 'score': 100},
  {'name': 'stu', 'score': 69}]]

# create list of names you've seen before
name_lists = []
# create lists of unique lists
unique_lists = []

# loop over each list you have
for L in dict_list:

    # get list of names
    names = [i['name'] for i in L]

    # check if you've seen this set of names before
    if set(names) not in [set(n) for n in name_lists]:
        print(names)
        # save these names
        name_lists.append(names)
        # add this list to your list of unique names
        unique_lists.append(L)

Output:

['fred', 'frank', 'bob']
['fred', 'frank', 'stu']

unique_lists Output:

[[{'name': 'fred', 'score': 19},
  {'name': 'frank', 'score': 100},
  {'name': 'bob', 'score': 99}],
 [{'name': 'fred', 'score': 19},
  {'name': 'frank', 'score': 100},
  {'name': 'stu', 'score': 69}]]

Note that this method will save only the scores for the first set of unique names and discard scores when the set of names is duplicated. If it is expected that the same names may have different scores, you may want to save every unique set of scores. In this case, you can follow the method given by PacketLoss below:

name_lists = []
unique_lists = []


for di, d in enumerate(dict_list):

    # get list of name, score tuples
    r = [(i['name'], i['score']) for i in d]
    # sort tuples alphabetically by name
    r.sort(key=lambda tup: tup[0])

    # check if these names and scores have been seen before
    if r not in name_lists:
        name_lists.append(r)
        unique_lists.append(dict_list[di])

Due to the ordering being off, a simple == will not match, we can work around this by gathering the data, sorting it as a list of tuples and checking if the match has been seen before.

data = [[{'name': 'fred', 'score': 19},
  {'name': 'frank', 'score': 100},
  {'name': 'bob', 'score': 99}],
 [{'name': 'frank', 'score': 100},
  {'name': 'fred', 'score': 19},
  {'name': 'bob', 'score': 99}],
 [{'name': 'bob', 'score': 99},
  {'name': 'frank', 'score': 100},
  {'name': 'fred', 'score': 19}],
 [{'name': 'fred', 'score': 19},
  {'name': 'frank', 'score': 100},
  {'name': 'stu', 'score': 69}]]

seen = list()
result = list()

for idx, d in enumerate(data):
    r = [(i['name'], i['score']) for i in d]
    r.sort(key=lambda tup: tup[0])
    if r not in seen:
        seen.append(r)
        result.append(data[idx])

With this method, we are checking that both the scores and names are a complete match, meaning if one score in a duplicate changed to 98 it would no longer be counted as a duplicate.

Output:

[[{'name': 'fred', 'score': 19}, {'name': 'frank', 'score': 100}, {'name': 'bob', 'score': 99}], [{'name': 'fred', 'score': 19}, {'name': 'frank', 'score': 100}, {'name': 'stu', 'score': 69}]]

Output with modifying scores in data:

data = [[{'name': 'fred', 'score': 19},
  {'name': 'frank', 'score': 100},
  {'name': 'bob', 'score': 99}],
 [{'name': 'frank', 'score': 100},
  {'name': 'fred', 'score': 19},
  {'name': 'bob', 'score': 99}],
 [{'name': 'bob', 'score': 98},
  {'name': 'frank', 'score': 100},
  {'name': 'fred', 'score': 19}],
 [{'name': 'fred', 'score': 19},
  {'name': 'frank', 'score': 100},
  {'name': 'stu', 'score': 69}]]

[[{'name': 'fred', 'score': 19}, {'name': 'frank', 'score': 100}, {'name': 'bob', 'score': 99}], [{'name': 'bob', 'score': 98}, {'name': 'frank', 'score': 100}, {'name': 'fred', 'score': 19}], [{'name': 'fred', 'score': 19}, {'name': 'frank', 'score': 100}, {'name': 'stu', 'score': 69}]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM