删除python列表中的重复项但记住索引

Question

How can I remove duplicates in a list, keep the original order of the items and remember the first index of any item in the list? 如何删除列表中的重复项，保留项目的原始顺序并记住列表中任何项目的第一个索引？

For example, removing the duplicates from [1, 1, 2, 3] yields [1, 2, 3] but I need to remember the indices [0, 2, 3] . 例如，从[1, 1, 2, 3] 1,1,2,3]中删除重复项会产生[1, 2, 3]但我需要记住索引[0, 2, 3] 。

I am using Python 2.7. 我使用的是Python 2.7。

Answer 1

I'd tackle this a little differently and use an OrderedDict and the fact that a lists index method will return the lowest index of an item. 我会稍微解决这个问题并使用OrderedDict并且列表index方法将返回项目的最低索引。

>>> from collections import OrderedDict
>>> lst = [1, 1, 2, 3]
>>> d = OrderedDict((x, lst.index(x)) for x in lst)
>>> d
OrderedDict([(1, 0), (2, 2), (3, 3)]

If you need the list (with its duplicates removed) and the indices separately, you can simply issue: 如果您需要列表（删除重复项）和索引，您可以简单地发出：

>>> d.keys()
[1, 2, 3]
>>> d.values()
[0, 2, 3]

Answer 2

Use enumerate to keep track of the index and a set to keep track of element seen: 使用enumerate跟踪索引和一组跟踪元素：

l = [1, 1, 2, 3]
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append(i)
    seen.add(ele)

If you want both: 如果你想要两个：

inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)

Or if you want both in different lists: 或者如果你想要两个在不同的列表中：

l = [1, 1, 2, 3]
inds, unq = [],[]
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append(i)
        unq.append(ele)
    seen.add(ele)

Using a set is by far the best approach: 使用套装是迄今为止最好的方法：

In [13]: l = [randint(1,10000) for _ in range(10000)]     

In [14]: %%timeit                                         
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)
   ....: 
100 loops, best of 3: 3.08 ms per loop

In [15]: timeit  OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 442 ms per loop

In [16]: l = [randint(1,10000) for _ in range(100000)]      
In [17]: timeit  OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 10.3 s per loop

In [18]: %%timeit                                       
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)
   ....: 
10 loops, best of 3: 22.6 ms per loop

So for 100k elements 10.3 seconds vs 22.6 ms , if you try with anything larger with less dupes like [randint(1,100000) for _ in range(100000)] you will have time to read a book. 因此，对于100k元素10.3秒对22.6 ms ，如果你尝试使用更少的欺骗，如[randint(1,100000) for _ in range(100000)]你将有时间阅读一本书。 Creating two lists is marginally slower but still orders of magnitude faster than using list.index. 创建两个列表的速度略慢，但仍比使用list.index快几个数量级。

If you want to get a value at a time you can use a generator function: 如果要一次获取一个值，可以使用生成器函数：

def yield_un(l):
    seen = set()
    for i, ele in enumerate(l):
        if ele not in seen:
            yield (i,ele)
        seen.add(ele)

删除python列表中的重复项但记住索引

问题描述

2 个解决方案

解决方案1
5 2016-01-02 19:49:35

解决方案2
3 已采纳 2016-01-02 19:49:02

删除python列表中的重复项但记住索引

问题描述

2 个解决方案

解决方案1 5 2016-01-02 19:49:35

解决方案2 3 已采纳 2016-01-02 19:49:02

解决方案1
5 2016-01-02 19:49:35

解决方案2
3 已采纳 2016-01-02 19:49:02