简体   繁体   中英

Removing duplicates from a list of lists in Python using deep copy

I have a list of dictonaries - list_1 = [{'account': '1234', 'email': 'abc@xyz.com'}, ... , ...] I wanted to remove the entries with duplicate emails in the list.

import copy
list_2 = copy.deepcopy(list_1)
for i in mainList
 for j in range(len(list_2)-1, -1, -1):
   if ((list_2[j]["email"] == mainList[i])):
                    list_1.remove(list1[j])

MainList here is the list of emails with which I am comparing values with. mainList looks like: ['abc@xyz.com', 'efg@cvb.com, ..., ...] The main problem is list_1 is not coming out correctly. If I use list, or slicing or even list comprehension to copy it, it will come out empty. The end result should give list_1 containing only one element/list/dictonary for each email. Using copy or deep copy at least gives me something. It also seems like sometimes I am getting an indexing error. using

for x in list_2:

instead, returns list_1 with only one item. The closest I got to the correct answer was iterating over list_1 itself while removing items but it was not 100% correct. Please help.

iterate over your list of dictionaries and keep saving every email in a new dictionary only if it is not already present.

temp = dict()
list_1 = [{'account': '1234', 'email': 'abc@xyz.com'}]
for d in list_1:
    if d['email'] in temp:
        continue
    else:
        temp[d['email']] = d
final_list = list(temp.values())

Seems like you want to remove duplicate dictionaries. Please mention the duplicate dictionaries also in the problem.

di = [{'account': '1234', 'email' : 'abc@xyz.com'}, {'account1': '12345', 
'email1' : 'abcd@xyz.com'}, {'account': '1234', 'email' : 'abc@xyz.com'}]
s=[i for n, i in enumerate(d) if i not in di[n + 1:]]
Print(s)

This would give you required output

[{'account1': '12345', 'email1': 'abcd@xyz.com'}, {'account': '1234', 'email': 
'abc@xyz.com'}]

The easiest way I feel to accomplish this is to create an indexed version of list_1 (a dictionary) based on your key.

list_1 = [
    {'account': '1234', 'email' : 'abc@xyz.com'},
    {'account': '1234', 'email' : 'abc@xyz.com'},
    {'account': '4321', 'email' : 'zzz@xyz.com'},
]

list_1_indexed = {}
for row in list_1:
    list_1_indexed.setdefault(row['email'], row)
list_2 = list(list_1_indexed.values())

print(list_2)

This will give you:

[
    {'account': '1234', 'email': 'abc@xyz.com'},
    {'account': '4321', 'email': 'zzz@xyz.com'}
]

I'm not sure I would recommend it, but if you wanted to use a comprehension you might do:

list_2 = list({row['email']: row for row in list_1}.values())

Note that the first strategy results in the first key row wins and the comprehension the last key row wins.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM