简体   繁体   中英

Remove NOT duplicates value from list

The scenario is this something like this:

After joining several lists using:

list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]

mainlist = list1 + list2 + list3
mainlist.sort()

mainlist now looks like that:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'E']

I would like to remove anything that is not a duplicate value. If the value in question is already present in the list it must not be touched and while if it is present only once in the mainlist I would like to delete it.

I tried to use this approach but seems something isn't working:

for i in mainlist:
    if mainlist.count(i) <= 1:
        mainlist.remove(i)
    else:
        continue

but what I return is a list that looks like the following:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'E'] #value "D" is not anymore present. Why?

What i would like to return is a list like that:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C'] #All values NOT duplicates have been deleted

I can delete the duplicates with the below code:

for i in mainlist:
    if mainlist.count(i) > 1:
        mainlist.remove(i)
    else:
        continue

and then as a final result:

mainlist = ['A','B','C']

But the real question is: how can I delete the non-duplicates in a list?

您可以找到这样的重复项:

duplicates = [item for item in mainlist if mainlist.count(item) > 1]

You can use collections.Counter() to keep track of the frequencies of each item:

from collections import Counter

counts = Counter(mainlist)
[item for item in mainlist if counts[item] > 1]

This outputs:

['A', 'A', 'B', 'B', 'C', 'C']

Use collections.Counter to count the list elements. Use list comprehension to keep only the elements that occur more than once. Note that the list does not have to be sorted.

from collections import Counter
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1 + list2 + list3

cnt = Counter(mainlist)
print(cnt)
# Counter({'A': 2, 'B': 2, 'C': 2, 'D': 1, 'E': 1})

dups = [x for x in mainlist if cnt[x] > 1]
print(dups)
# ['A', 'B', 'A', 'B', 'C', 'C']

Another solution, using numpy :

u, c = np.unique(mainlist, return_counts=True)
out = np.repeat(u[c > 1], c[c > 1])
print(out)

Prints:

['A' 'A' 'B' 'B' 'C' 'C']

Your problem lies in you operating on the while iterating over it. After removing the "D" the loops stops because there are no more elements in the list as the "E" at index 6.

Create a copy of the list and only operate on that list:

new_list = list(mainlist)
for i in mainlist:
    if mainlist.count(i) <= 1:
        new_list.remove(i)
    else:
        continue

If you want to output only a list of duplicate elements in your lists, you can use sets and a comprehension to keep only the duplicates.

list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]

fulllist = list1 + list2 + list3
fullset = set(list1) | set(list2) | set(list3)

dups = [x for x in fullset if fulllist.count(x) > 1]

print(dups)  # ['A', 'C', 'B']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM