简体   繁体   中英

Python remove partial duplicates from a list

I have a list of items that was improperly created. Instead of copying the whole item once, it made multiple partial copies of the same item. The partial duplicates are mixed with other duplicates and some unique items. For example list a:

a = ['one two','one two three four','one two three','five six','five six seven','eight nine']

I want to remove the partial duplicates and keep the longest expression of the item. For example I would like to produce list b:

b = ['one two three four', 'five six seven','eight nine']

The integrity of the item must remain intact, cannot become:

c = '[two one three four', 'vife six seven', 'eight nine']

Try this:

def group_partials(strings):
    it = iter(sorted(strings))
    prev = next(it)
    for s in it:
        if not s.startswith(prev):
            yield prev
        prev = s
    yield s

a = ['one two','one two three', 'one two three four', 'five six', 'five six seven', 'eight nine']
b = list(group_partials(a))

You can use sets for this.

Try this code

a = ['one two','one two three', 'one two three four', 'five six', 'five six seven','eight nine']

# check for subsets
for i in range(len(a)):
   for j in range(len(a)):
      if i==j: continue # same index
      if (set(a[i].split()) & set(a[j].split())) == set(a[i].split()): # if subset
         a[i]="" # clear string

# a = [x for x in a if len(x)]  # remove empty strings

b = []
for x in a:  # each string in a
   if len(x) > 0: # if not empty
      b.append(x)  # add to final list  

a = b

print(a)

Output

['one two three four', 'five six seven', 'eight nine']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM