How to remove duplicates, by sublist item subset, in a list of lists in Python?

Question

I have a list of lists in Python that is defined like this: [[2, 3, 5], [3, 3, 1], [2, 3, 8]] , not I want to delete the duplicate entries, but by duplicate I mean that the first two elements of each list match, for example, the first and third list have 2 and 3 as their first and second elements, therefore, I count it as a duplicate, and after removing I want to have the final list: [[2, 3, 5], [3, 3, 1]] . Currently, I have something like this:

arr = [[2, 3, 5], [3, 3, 1], [2, 3, 8]]

first = [item[0] for item in arr]
second = [item[1] for item in arr]
zipped = zip(first, second)

This produces list of tuples with the first two entries of each list. Now, I can try to get the index of duplicate entries and remove those indices from the original list. But, are there shorter ways to do what I want? If not, what is the best way to get the duplicate indices here?

Answer 1

Solution

You can use sets to accomplish this:

arr = [[2, 3, 5], [3, 3, 1], [2, 3, 8]]

used = set()
[used.add(tuple(x[:2])) or x for x in arr if tuple(x[:2]) not in used]

returns

[[2, 3, 5], [3, 3, 1]]

Notes

The first expression is only evaluated if the first two elements of any sublist are not in used . Checkout the docs on list comprehensions for more info.
Know that set.add always returns None . So used.add(tuple(x[:2])) or x always evaluates to x .
We need to convert the first two elements of a sublist to an immutable (eg tuple) since list is not hashable.

Finally as @wim brings up if you're not familiar with this pattern it can be difficult to understand and in Python "Readability counts." So if you're writing code that will be shared consider changing this to an explicit for loop or using another approach.

Answer 2

You can use collections.OrderedDict for an order-preserving de-dupe:

>>> d = OrderedDict(((x[0], x[1]), x) for x in reversed(L))
>>> print(*d.values())
[2, 3, 5] [3, 3, 1]

To keep the last instead of the first, just remove the reversed :

>>> OrderedDict(((x[0], x[1]), x) for x in L).values()
odict_values([[2, 3, 8], [3, 3, 1]])

Or use a plain old for-loop:

def dedupe(iterable):
    seen = set()
    for x in iterable:
        first, second, *rest = x
        if (first, second) not in seen:
            seen.add((first, second))
            yield x

How to remove duplicates, by sublist item subset, in a list of lists in Python?

Question

2 answers

solution1
2 ACCPTED 2018-03-04 17:45:16

Solution

Notes

solution2
2 2018-03-04 17:45:41

How to remove duplicates, by sublist item subset, in a list of lists in Python?

Question

2 answers

solution1 2 ACCPTED 2018-03-04 17:45:16

Solution

Notes

solution2 2 2018-03-04 17:45:41

solution1
2 ACCPTED 2018-03-04 17:45:16

solution2
2 2018-03-04 17:45:41