[英]Remove duplicates in python list but remember the index
如何刪除列表中的重復項,保留項目的原始順序並記住列表中任何項目的第一個索引?
例如,從[1, 1, 2, 3]
1,1,2,3]中刪除重復項會產生[1, 2, 3]
但我需要記住索引[0, 2, 3]
。
我使用的是Python 2.7。
我會稍微解決這個問題並使用OrderedDict
並且列表index
方法將返回項目的最低索引。
>>> from collections import OrderedDict
>>> lst = [1, 1, 2, 3]
>>> d = OrderedDict((x, lst.index(x)) for x in lst)
>>> d
OrderedDict([(1, 0), (2, 2), (3, 3)]
如果您需要列表(刪除重復項)和索引,您可以簡單地發出:
>>> d.keys()
[1, 2, 3]
>>> d.values()
[0, 2, 3]
使用enumerate
跟蹤索引和一組跟蹤元素:
l = [1, 1, 2, 3]
inds = []
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append(i)
seen.add(ele)
如果你想要兩個:
inds = []
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append((i,ele))
seen.add(ele)
或者如果你想要兩個在不同的列表中:
l = [1, 1, 2, 3]
inds, unq = [],[]
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append(i)
unq.append(ele)
seen.add(ele)
使用套裝是迄今為止最好的方法:
In [13]: l = [randint(1,10000) for _ in range(10000)]
In [14]: %%timeit
inds = []
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append((i,ele))
seen.add(ele)
....:
100 loops, best of 3: 3.08 ms per loop
In [15]: timeit OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 442 ms per loop
In [16]: l = [randint(1,10000) for _ in range(100000)]
In [17]: timeit OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 10.3 s per loop
In [18]: %%timeit
inds = []
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append((i,ele))
seen.add(ele)
....:
10 loops, best of 3: 22.6 ms per loop
因此,對於100k
元素10.3
秒對22.6 ms
,如果你嘗試使用更少的欺騙,如[randint(1,100000) for _ in range(100000)]
你將有時間閱讀一本書。 創建兩個列表的速度略慢,但仍比使用list.index快幾個數量級。
如果要一次獲取一個值,可以使用生成器函數:
def yield_un(l):
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
yield (i,ele)
seen.add(ele)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.