Ok so I have a list like this, what I need is to remove the duplicate values so that I end up with just- Joe Blow, Don Wiliams, Clark Gordon... Im trying this code which does not seem to work. I also tried to conver the list into a set but no go.
Any Ideas? Thanks
dupes = ["Joe Joe Joe Blow","Don Don Williams", "Clark Clark Gordon", "Albert Riddle"]
def remove_duplicates(dupes):
ulist = []
[ulist.append(x) for x in dupes if x not in ulist]
return ulist
a=' '.join(remove_duplicates(dupes))
print(a)
Turn your string into a list, cast it into a set, then join it back with ' '. When casting to a set, preserve order by sorting by the index of the original string.
for s in dupes:
print(' '.join(sorted(set(s.split()), key=s.index)))
output:
Joe Blow
Don Williams
Clark Gordon
Albert Riddle
Edit: If you want to alter the list in place:
def remove_duplicates(dupes):
for i in range(len(dupes)):
dupes[i] = ' '.join(sorted(set(dupes[i].split()), key=dupes[i].index))
The long but stable way:
dupes = ["Joe Joe Joe Blow","Don Don Williams", "Clark Clark Gordon", "Albert Riddle"]
rv = [[]]
for d in dupes:
seen = set()
for e in d.split(): # split each string into its name, add the name to the
if e not in seen: # last list in rv and to the set 'seen' that remembers
rv[-1].append(e) # the seen ones.
seen.add(e)
rv[-1] = ' '.join(rv[-1]) # done with one name, replace the list with joined values
rv.append([]) # and append an empty, new list for the next name
dupes = [k for k in rv if k] # remove the empty list at the end and overwrite dupes
print(dupes)
Output:
['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']
You can use the re.sub
method to replace repetitions to a word with just the word:
import re
def remove_duplicates(string):
return re.sub(r'\b(\w+)\b(?:\s+\1)+', r'\1', string)
so that:
[remove_duplicates(dupe) for dupe in dupes]
returns:
['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']
You can use itertools.groupby
:
from itertools import groupby
def remove_duplicates(string):
return ' '.join(k for k, _ in groupby(string.split()))
so that:
[remove_duplicates(dupe) for dupe in dupes]
returns:
['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']
When order is important collections.OrderedDict comes in handy:
from collections import OrderedDict
dupes = ["Joe Joe Joe Blow", "Don Don Williams", "Clark Clark Gordon", "Albert Riddle"]
result = [' '.join(OrderedDict.fromkeys(w.split())) for w in dupes]
print(result)
Output
['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']
Lots of good answers already, you can also try Counter:
from collections import Counter
counters = [Counter(d.split()) for d in dupes]
final = [' '.join(c.keys()) for c in counters]
# ['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']
Please use set which
list(set(l))
# where l is your str
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.