简体   繁体   中英

How to remove these duplicates in a list (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 

]

I would like to remove the THIRD element inside the list...because it has "u2.com" as a duplicate. I don't want duplicate "link" element. What is the most efficient code to do this so that it results in this:

biglist = 

[ 

    {'title':'U2','link':'u2.com'}, 
    {'title':'ABC','link':'abc.com'}
]

I have tried many ways, including using many nested "for ...in ...." but this is very inefficient and too long.

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist

Make a new dictionary, with 'u2.com' and 'abc.com' as the keys, and your list elements as the values. The dictionary will enforce uniqueness. Something like this:

uniquelist = dict((element['link'], element) for element in reversed(biglist))

(The reversed is there so that the first elements in the list will be the ones that remain in the dictionary. If you take that out, then you will get the last element instead).

Then you can get elements back into a list like this:

biglist = uniquelist.values()

You can sort the list, using the link field of each dictionary as the sort key, then iterate through the list once and remove duplicates (or rather, create a new list with duplicates removed, as is the Python idiom), like so:

# sort the list using the 'link' item as the sort key
biglist.sort(key=lambda elt: elt['link'])

newbiglist = []
for item in biglist:
    if newbiglist == [] or item['link'] != newbiglist[-1]['link']:
        newbiglist.append(item)

This code will give you the first element (relative ordering in the original biglist ) for any group of "duplicates". This is true because the .sort() algorithm used by Python is guaranteed to be a stable sort -- it does not change the order of elements determined to be equal to one another (in this case, elements with the same link ).

biglist = \
[ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

def dedupe(lst):
    d = {}
    for x in lst:
        link = x["link"]
        if link in d:
            continue
        d[link] = x
    return d.values()

lst = dedupe(biglist)

dedupe() keeps the first of any duplicates.

You can use defaultdict to group items by link , then removed duplicates if you want to.

from collections import defaultdict

nodupes = defaultdict(list)
for d in biglist:
    nodupes[d['url']].append(d['title']

This will give you:

defaultdict(<type 'list'>, {'abc.com': ['ABC Station'], 'u2.com': ['U2 Band', 
'Live Concert by U2']})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM