从字符串列表中删除重复项

Question

Ok so I have a list like this, what I need is to remove the duplicate values so that I end up with just- Joe Blow, Don Wiliams, Clark Gordon... Im trying this code which does not seem to work. 好的，所以我有一个这样的列表，我需要删除重复的值，以便最终得到-Joe Blow，Don Wiliams，Clark Gordon ...我正在尝试这段似乎不起作用的代码。 I also tried to conver the list into a set but no go. 我还尝试将列表融合为一组，但没有成功。

Any Ideas? 有任何想法吗？ Thanks 谢谢

dupes = ["Joe Joe Joe Blow","Don Don Williams", "Clark Clark Gordon", "Albert Riddle"]
def remove_duplicates(dupes):
    ulist = []
    [ulist.append(x) for x in dupes if x not in ulist]    
    return ulist
a=' '.join(remove_duplicates(dupes))

print(a)

Answer 1

Turn your string into a list, cast it into a set, then join it back with ' '. 将您的字符串转换为列表，将其转换为集合，然后使用''将其重新加入。 When casting to a set, preserve order by sorting by the index of the original string. 强制转换为集合时，请通过按原始字符串的索引进行排序来保留顺序。

for s in dupes:
    print(' '.join(sorted(set(s.split()), key=s.index)))

output: 输出：

Joe Blow
Don Williams
Clark Gordon
Albert Riddle

Edit: If you want to alter the list in place: 编辑：如果您想更改列表到位：

def remove_duplicates(dupes):
    for i in range(len(dupes)):
        dupes[i] = ' '.join(sorted(set(dupes[i].split()), key=dupes[i].index))

Answer 2

The long but stable way: 长期但稳定的方法：

dupes = ["Joe Joe Joe Blow","Don Don Williams", "Clark Clark Gordon", "Albert Riddle"]

rv = [[]]
for d in dupes:
    seen = set()
    for e in d.split():         # split each string into its name, add the name to the 
        if e not in seen:       # last list in rv and to the set 'seen' that remembers
            rv[-1].append(e)    # the seen ones.
            seen.add(e)
    rv[-1] = ' '.join(rv[-1])   # done with one name, replace the list with joined values
    rv.append([])               # and append an empty, new list for the next name

dupes = [k for k in rv if k]    # remove the empty list at the end and overwrite dupes

print(dupes)

Output: 输出：

['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']

Answer 3

You can use the re.sub method to replace repetitions to a word with just the word: 您可以使用re.sub方法将单词的重复替换为仅单词：

import re
def remove_duplicates(string):
    return re.sub(r'\b(\w+)\b(?:\s+\1)+', r'\1', string)

so that: 以便：

[remove_duplicates(dupe) for dupe in dupes]

returns: 返回：

['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']

Answer 4

You can use itertools.groupby : 您可以使用itertools.groupby ：

from itertools import groupby
def remove_duplicates(string):
    return ' '.join(k for k, _ in groupby(string.split()))

so that: 以便：

[remove_duplicates(dupe) for dupe in dupes]

returns: 返回：

['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']

Answer 5

When order is important collections.OrderedDict comes in handy: 当顺序很重要collections.OrderedDict就派上用场了：

from collections import OrderedDict

dupes = ["Joe Joe Joe Blow", "Don Don Williams", "Clark Clark Gordon", "Albert Riddle"]
result = [' '.join(OrderedDict.fromkeys(w.split())) for w in dupes]
print(result)

Output 输出量

['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']

Answer 6

Lots of good answers already, you can also try Counter: 已经有很多不错的答案，您也可以尝试使用Counter：

from collections import Counter

counters = [Counter(d.split()) for d in dupes]
final = [' '.join(c.keys()) for c in counters]

# ['Joe Blow', 'Don Williams', 'Clark Gordon', 'Albert Riddle']

Answer 7

Please use set which 请使用哪个

  list(set(l)) 
  # where l is your str

从字符串列表中删除重复项

问题描述

7 个解决方案

解决方案1
5 2019-01-03 20:07:31

解决方案2
1 2019-01-03 20:10:10

解决方案3
1 2019-01-03 20:10:10

解决方案4
1 2019-01-03 20:17:37

解决方案5
0 2019-01-03 20:20:41

解决方案6
0 2019-01-03 20:56:08

解决方案7
-2 2019-01-03 20:19:05

从字符串列表中删除重复项

问题描述

7 个解决方案

解决方案1 5 2019-01-03 20:07:31

解决方案2 1 2019-01-03 20:10:10

解决方案3 1 2019-01-03 20:10:10

解决方案4 1 2019-01-03 20:17:37

解决方案5 0 2019-01-03 20:20:41

解决方案6 0 2019-01-03 20:56:08

解决方案7 -2 2019-01-03 20:19:05

解决方案1
5 2019-01-03 20:07:31

解决方案2
1 2019-01-03 20:10:10

解决方案3
1 2019-01-03 20:10:10

解决方案4
1 2019-01-03 20:17:37

解决方案5
0 2019-01-03 20:20:41

解决方案6
0 2019-01-03 20:56:08

解决方案7
-2 2019-01-03 20:19:05