简体   繁体   English

Python 从列表中删除部分重复项

[英]Python remove partial duplicates from a list

I have a list of items that was improperly created.我有一个不正确创建的项目列表。 Instead of copying the whole item once, it made multiple partial copies of the same item.它不是一次复制整个项目,而是制作了同一项目的多个部分副本。 The partial duplicates are mixed with other duplicates and some unique items.部分重复项与其他重复项和一些独特的项目混合在一起。 For example list a:例如列出一个:

a = ['one two','one two three four','one two three','five six','five six seven','eight nine']

I want to remove the partial duplicates and keep the longest expression of the item.我想删除部分重复项并保留项目的最长表达式。 For example I would like to produce list b:例如,我想生成列表 b:

b = ['one two three four', 'five six seven','eight nine']

The integrity of the item must remain intact, cannot become:物品的完整性必须保持完整,不能变成:

c = '[two one three four', 'vife six seven', 'eight nine'] c = '[二一三四', 'vife 六七', '八九']

Try this:尝试这个:

def group_partials(strings):
    it = iter(sorted(strings))
    prev = next(it)
    for s in it:
        if not s.startswith(prev):
            yield prev
        prev = s
    yield s

a = ['one two','one two three', 'one two three four', 'five six', 'five six seven', 'eight nine']
b = list(group_partials(a))

You can use sets for this.您可以为此使用集合。

Try this code试试这个代码

a = ['one two','one two three', 'one two three four', 'five six', 'five six seven','eight nine']

# check for subsets
for i in range(len(a)):
   for j in range(len(a)):
      if i==j: continue # same index
      if (set(a[i].split()) & set(a[j].split())) == set(a[i].split()): # if subset
         a[i]="" # clear string

# a = [x for x in a if len(x)]  # remove empty strings

b = []
for x in a:  # each string in a
   if len(x) > 0: # if not empty
      b.append(x)  # add to final list  

a = b

print(a)

Output输出

['one two three four', 'five six seven', 'eight nine']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM