简体   繁体   中英

Removing duplicates from list of lists each element compared, preserving order

I have a list of lists in Python and want to remove duplicates with each element being compared and yes, I care (the most) about the order of the list.

I have already looked at most of the solutions on stack overflow. For eg the one here , solution voted doesn't compare the elements of list of lists and would fine for the given use case, provided it's sorted. However, what I'm looking for is:

What's available

s1 = ['Lucifer', 'Ella']
s2 = ['Lucifer', 'Eve']
s3 = ['Chloe', 'Lucifer']
s4 = ['Lucifer', 'Linda']

What's required

As I said, care about the order, remove duplicate from each list (that's why 'Lucifer' is missing from the list) and get only unique ones..

['Ella', 'Eve', 'Chloe', 'Linda']

Caution: It will not always be the first element that would be common across list for eg in s4 it can be 'Amenadiel'. For now let's assume that length of lists to be compared is always 2.

What have I tried

list((set(s1) ^ set(s2) ^ set(s3) ^ set(s4)) ^ (set(s1) & set(s2) & set(s3) & set(s4)))

the output from above is ['Chloe', 'Eve', 'Linda', 'Ella', 'Lucifer'] which is not as expected (as mentioned above!).

This was working fine with two or three list, but isn't working with four list.

Please help!

TIA.

You can first find the duplicates with a Counter, then filter them out in a second pass:

from collections import Counter

s1 = ['Lucifer', 'Ella']
s2 = ['Lucifer', 'Eve']
s3 = ['Chloe', 'Lucifer']
s4 = ['Lucifer', 'Linda']
lists = [s1, s2, s3, s4]
flattened = [item for sublist in lists for item in sublist]

counts = Counter(flattened)
deduped = [x for x in flattened if counts[x] <= 1]

print(deduped) # ['Ella', 'Eve', 'Chloe', 'Linda']

If your goal is to only remove elements in every list:

s1 = ['Lucifer', 'Ella']
s2 = ['Lucifer', 'Eve']
s3 = ['Chloe', 'Lucifer']
s4 = ['Lucifer', 'Linda']
lists = [s1, s2, s3, s4]
flattened = [item for sublist in lists for item in sublist]

seen = set.intersection(*map(set, lists))
deduped = []
for item in flattened:
    if item in seen:
        continue
    seen.add(item)
    deduped.append(item)
print(deduped) # ['Ella', 'Eve', 'Chloe', 'Linda']

An alternative solution using also Counter and chain :

from collections import Counter
from itertools import chain

s1 = ['Lucifer', 'Ella']
s2 = ['Lucifer', 'Eve']
s3 = ['Chloe', 'Lucifer']
s4 = ['Lucifer', 'Linda']

# count the occurrences
counts = Counter(chain(s1, s2, s3, s4))

# remove duplicates
result = [name for name in chain(s1, s2, s3, s4) if counts[name] < 2]

print(result)

Output

['Ella', 'Eve', 'Chloe', 'Linda']

UPDATE

If you want those that do not appear in all the lists, do:

from collections import Counter, OrderedDict
from itertools import chain

s1 = ['Lucifer', 'Ella']
s2 = ['Lucifer', 'Eve']
s3 = ['Chloe', 'Lucifer']
s4 = ['Amenadiel', 'Linda']

all_ls = [s1, s2, s3, s4]

# count the occurrences
counts = Counter(chain.from_iterable(all_ls))

# remove duplicates i.e do not appear in all lists
names = list(OrderedDict.fromkeys(chain.from_iterable(all_ls)))
result = [name for name in names if counts[name] < len(all_ls)]

print(result)

Output

['Lucifer', 'Ella', 'Eve', 'Chloe', 'Amenadiel', 'Linda']

One way to do this is to add up all the lists initially. Then, convert this list into a dictionary. Thereafter, convert this temporary dictionary into a list.

s1 = ['Lucifer', 'Ella']
s2 = ['Lucifer', 'Eve']
s3 = ['Chloe', 'Lucifer']
s4 = ['Lucifer', 'Linda']

s_all = s1 + s2 + s3 + s4       # Concatenate all lists

temp_dict = dict.fromkeys(s)    # Create a dictionary with keys from list `s_all`
s_no_dupes = list(temp_dict)    # Convert the dictionary to a list

print(s_no_dumpes)

Output


['Lucifer', 'Ella', 'Eve', 'Chloe', 'Linda']

Note: This will remove all the duplicates but if you want them to be in order, you must have Python 3.6 or above, since, it was from Python 3.6 that the dictionary keys were ordered.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM