drop first duplicates from set python and keep only last unique

Question

i'm trying to do this with this function

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

so i want to check

f7([5, 5, 9, 6, 8, 7, 7, 8, 6, 9])

i'm trying to do this and output is

[5, 9, 6, 8, 7]

this keeps only first value. but i need keep only last elements.

so output should be

[5, 7, 8, 6, 9]

Answer 1

This could work:

In [1847]: def f7(seq): 
      ...:     seen = set() 
      ...:     seen_add = seen.add 
      ...:     return [x for x in seq[::-1] if not (x in seen or seen_add(x))][::-1] 
      ...:                                                                                                                                                                                                  

In [1848]: f7([5, 5, 9, 6, 8, 7, 7, 8, 6, 9])                                                                                                                                                          
Out[1848]: [5, 7, 8, 6, 9]

Answer 2

You can collect the unique items by creating a set , then order by the index of the reversed sequence to find the last instance of that element, then reverse back to the original order

def f7(seq):
    return sorted(set(seq), key=lambda i: seq[::-1].index(i), reverse=True)

>>> f7([5, 5, 9, 6, 8, 7, 7, 8, 6, 9])
[5, 7, 8, 6, 9]

Answer 3

Do it in reverse. Here's a rough implementation:

l = [5, 5, 9, 6, 8, 7, 7, 8, 6, 9]
def f7(seq):
    seen = set()
    seen_add = seen.add
    return list(reversed([x for x in reversed(seq) if not (x in seen or seen_add(x))]))

print(f7(l))

Output:

[5, 7, 8, 6, 9]

If you want to make this more efficient, you can use descending for loops and/or a collections.deque

Answer 4

You can reverse the list initially and then reverse the answer.

>>> def f7(seq):
...     seen = set()
...     seen_add = seen.add
...     return [x for x in seq[::-1] if not (x in seen or seen_add(x))][::-1]
...
>>> print(f7([5, 5, 9, 6, 8, 7, 7, 8, 6, 9]) )
[5, 7, 8, 6, 9]

Answer 5

you can use dict.fromkeys :

def f7(seq):
    return list(dict.fromkeys(seq[::-1]))[::-1]

print(f7([5, 5, 9, 6, 8, 7, 7, 8, 6, 9]))
# [5, 7, 8, 6, 9]

this will work if your python version is >=3.6 because it is base on the insertion order in dictionaries

here is a simple benchmark with the proposed solutions:

from simple_benchmark import BenchmarkBuilder
import random
from collections import deque


b = BenchmarkBuilder()

@b.add_function()
def MayankPorwal(seq):
    seen = set() 
    seen_add = seen.add 
    return [x for x in seq[::-1] if not (x in seen or seen_add(x))][::-1] 

@b.add_function()
def CoryKramer(seq):
    return sorted(set(seq), key=lambda i: seq[::-1].index(i), reverse=True)

@b.add_function()
def kederrac(seq):
    return list(dict.fromkeys(seq[::-1]))[::-1]

@b.add_function()
def LeKhan9(seq):
    q = deque()
    seen = set()
    seen_add = seen.add
    for x in reversed(seq):
        if not (x in seen or seen_add(x)):
            q.appendleft(x)

    return list(q)

@b.add_arguments('List lenght')
def argument_provider():
    for exp in range(2, 14):
        size = 2**exp
        yield size, [random.randint(0, size) for _ in range(size)]

r = b.run()
r.plot()

Answer 6

In order to avoid reversing twice, you can use a queue data structure. The appends here should run in constant time.

from collections import deque

def f7(seq):
    q = deque()
    seen = set()
    seen_add = seen.add
    for x in reversed(seq):
        if not (x in seen or seen_add(x)):
            q.appendleft(x)

    return list(q)


print f7([5, 5, 9, 6, 8, 7, 7, 8, 6, 9])

output: [5, 7, 8, 6, 9]

drop first duplicates from set python and keep only last unique

Question

6 answers

solution1
2 2020-04-20 19:55:43

solution2
0 2020-04-20 19:52:45

solution3
0 2020-04-20 19:56:15

solution4
0 2020-04-20 19:57:33

solution5
0 2020-04-20 20:00:40

solution6
0 2020-04-20 20:40:13

drop first duplicates from set python and keep only last unique

Question

6 answers

solution1 2 2020-04-20 19:55:43

solution2 0 2020-04-20 19:52:45

solution3 0 2020-04-20 19:56:15

solution4 0 2020-04-20 19:57:33

solution5 0 2020-04-20 20:00:40

solution6 0 2020-04-20 20:40:13

solution1
2 2020-04-20 19:55:43

solution2
0 2020-04-20 19:52:45

solution3
0 2020-04-20 19:56:15

solution4
0 2020-04-20 19:57:33

solution5
0 2020-04-20 20:00:40

solution6
0 2020-04-20 20:40:13