![](/img/trans.png)
[英]Fastest way of finding common elements between two list of lists in python
[英]The fastest way to find common elements at the beginning of 2 python lists?
在兩個python列表的開頭找到公共元素的最快方法是什么? 我使用for循環編寫它,但我認為用列表推導寫它會更快...不幸的是我不知道如何在列表理解中放棄。 這是我寫的代碼:
import datetime
list1=[1,2,3,4,5,6]
list2=[1,2,4,3,5,6]
#This is the "for loop" version, and takes about 60 ms on my machine
start=datetime.datetime.now()
out=[]
for (e1, e2) in zip(list1, list2):
if e1 == e2:
out.append(e1)
else:
break
end=datetime.datetime.now()
print out
print "Execution time: %s ms" % (float((end - start).microseconds) / 1000)
#This is the list-comprehension version, it takes about 15 ms to run,
#but unfortunately returns the wrong result because I can't break the loop.
start=datetime.datetime.now()
out = [ e1 for (e1, e2) in zip(list1, list2) if e1 == e2 ]
end=datetime.datetime.now()
print out
print "Execution time: %s ms" % (float((end - start).microseconds) / 1000)
沒有列表理解也有好的解決方案嗎?
>>> from operator import ne
>>> from itertools import count, imap, compress
>>> list1[:next(compress(count(), imap(ne, list1, list2)), 0)]
[1, 2]
時序:
from itertools import *
from operator import ne
def f1(list1, list2, enumerate=enumerate, izip=izip):
out = []
out_append = out.append
for e1, e2 in izip(list1, list2):
if e1 == e2:
out_append(e1)
else:
break
return out
def f2(list1, list2, list=list, takewhile=takewhile, izip=izip):
return [i for i, j in takewhile(lambda (i,j):i==j, izip(list1, list2))]
def f3(list1, list2, next=next, compress=compress, count=count, imap=imap,
ne=ne):
return list1[:next(compress(count(), imap(ne, list1, list2)), 0)]
def f4(list1, list2):
out = []
out_append = out.append
i = 0
end = min(len(list1), len(list2))
while i < end and list1[i]==list2[i]:
out_append(list1[i])
i+=1
return out
def f5(list1, list2, len=len, enumerate=enumerate):
if len(list1) > len(list2):
list1, list2 = list2, list1
for i, e in enumerate(list1):
if list2[i] != e:
return list1[:i]
return list1[:]
def f6(list1, list2, enumerate=enumerate):
result = []
append = result.append
for i,e in enumerate(list1):
if list2[i] == e:
append(e)
continue
break
return result
from timeit import timeit
list1 =[1,2,3,4,5,6];list2=[1,2,4,3,5,6]
sol = f3(list1, list2)
for func in 'f1', 'f2', 'f3', 'f4', 'f5', 'f6':
assert eval(func + '(list1, list2)') == sol, func + " produces incorrect results"
print func
print timeit(stmt=func + "(list1, list2)", setup='from __main__ import *')
f1
1.52226996422
f2
2.44811987877
f3
2.04677891731
f4
1.57675600052
f5
1.6997590065
f6
1.71103715897
對於list1=[1]*100000+[1,2,3,4,5,6]; list2=[1]*100000+[1,2,4,3,5,6]
list1=[1]*100000+[1,2,3,4,5,6]; list2=[1]*100000+[1,2,4,3,5,6]
, timeit
自定義為100
時間, timeit(stmt=func + "(list1, list2)", setup='from __main__ import list1, list2, f1,f2,f3,f4', number=1000)
f1
14.5194740295
f2
29.8510630131
f3
12.6024291515
f4
24.465034008
f5
12.1111371517
f6
16.6644029617
因此@ThijsvanDien的這個解決方案是最快的,這是一個接近的第二,但我仍然喜歡它的功能風格;)
但numpy
總是勝利(你應該總是使用numpy
的事情)
>>> import numpy as np
>>> a, b = np.array([1,2,3,4,5,6]), np.array([1,2,4,3,5,6])
>>> def f8(a, b, nonzero=np.nonzero):
return a[:nonzero(a!=b)[0][0]]
>>> f8(a, b)
array([1, 2])
>>> timeit(stmt="f8(a, b)", setup='from __main__ import *')
6.50727105140686
>>> a, b = np.array([1]*100000+[1,2,3,4,5,6]), np.array([1]*100000+[1,2,4,3,5,6])
>>> timeit(stmt="f8(a, b)", setup='from __main__ import *', number=1000)
0.7565150260925293
可能有一個更快的numpy
解決方案,但這表明它有多快。
>>> from itertools import izip, takewhile
>>> list1=[1,2,3,4,5,6]
>>> list2=[1,2,4,3,5,6]
>>> list(takewhile(lambda (i,j):i==j, izip(list1, list2)))
[(1, 1), (2, 2)]
要么
>>> list(takewhile(lambda i,j=iter(list2):i==next(j), list1))
[1, 2]
我不明白為什么人們會痴迷於這一行。 這是我的解決方案: 編輯: @roots建議在本地存儲result
的append
方法。
result = []
append = result.append
for i,e in enumerate(List1):
if List2[i] == e:
append(e)
continue
break
輸入:
List1 = [1,2,3,4,5,9,8,1,2,3]
List2 = [1,2,3,5,5,9,8,1,2,3]
產生
>>>
[1, 2, 3]
根據@jamylak的測試:( a.py
)
print(timeit.timeit("""
result = []
append = result.append
for i,e in enumerate(List1):
if List2[i] == e:
append(e)
continue
break""",
setup="List1 =[1]*10000+[1,2,3,4,5,6];List2=[1]*10000+[1,2,4,3,5,6]",number=1000))
我明白了
Microsoft Windows [Version 6.2.9200]
(c) 2012 Microsoft Corporation. All rights reserved.
C:\Users\Henry\Desktop>a.py
0.770009684834
這使它非常接近@dugres解決方案,其時鍾頻率為0.752079322295
這個解決方案受到@HennyH的啟發,與@ jamylak最快的長列表一樣快,而對於短列表更快,可以說更具可讀性:
def f5(list1, list2):
if len(list1) > len(list2):
list1, list2 = list2, list1
for i, e in enumerate(list1):
if list2[i] != e:
return list1[:i]
return list1[:]
計時(簡短名單):
f1
1.17119693756
f2
1.82656407356
f3
1.51235413551
f4
1.45300602913
f5
1.13586807251
計時(長名單):
f1
1.52571296692
f2
2.99596500397
f3
1.02547097206
f4
2.44235897064
f5
1.02724885941
注意使用PyPy 2.0.1時非常有趣的結果:
f1
0.221760034561
f2
0.210422992706
f3
5.4270939827
f4
0.20907497406
f5
0.0702250003815
沒有“壓縮”和“追加”會更快:
i = 0
while list1[i]==list2[i]:
i+=1
out = list1[:i]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.