[英]Which is faster? Checking if something is in a Python list or not? I.e. membership vs non-membership
對於那些比我更了解計算機科學的人來說,這可能是一個菜鳥問題,也可能是顯而易見的。 也許這就是為什么我在搜索之后找不到Google或SO的任何內容。 也許我沒有使用正確的詞匯。
標題說明了一切。 如果我知道x
大部分時間都在my_list
中,那么以下哪個更快?
if x in my_list:
func1(x)
else:
func2(x)
要么
if x not in my_list:
func2(x)
else:
func1(x)
列表的大小是否重要? 例如10個元素與10,000個元素? 對於我的特殊情況, my_list
由字符串和整數組成,但有沒有人知道其他考慮是否適用於更復雜的類型,如dicts?
謝謝。
檢查元素是否在列表中,或者元素是否x in my_list
中調用相同操作x in my_list
的列表x in my_list
,因此應該沒有任何區別。
列表的大小是否重要?
檢查元素是否在列表中是O(N)操作,這意味着大小確實很重要,大致成比例。
如果你需要做很多檢查,你可能想查看set
,檢查一個元素是否在一個set
是O(1),這意味着檢查時間不會隨着set
大小增加而改變太多。
應該沒有明顯的性能差異。 你最好不要寫任何一個讓你的代碼更具可讀性的文章。 任何一個都是O(n)復雜度,並且主要取決於元素在列表中的位置。 此外,您應該避免過早優化,對大多數用例來說無關緊要,如果確實如此,通常最好使用其他數據結構。
如果要以更快的性能進行查找,請使用dicts,它們可能具有O(1)復雜性。 有關詳細信息,請參閱https://wiki.python.org/moin/TimeComplexity 。
Python包含一個模塊和函數timeit
,它可以告訴你執行代碼片段需要多長時間。 片段必須是單個語句,這樣就不會像if
一樣直接計算復合語句if
但是我們可以將語句包裝在函數中並為函數調用計時。
比調用timeit.timeit()
更容易使用一個jupyter筆記本並在一行的開頭使用魔術%timeit
magic語句。
這證明了長期列表或簡短,成功或失敗,您詢問的兩種方式, in alist
檢查in alist
還是not in alist
測量的可變性內給出相同的時間。
import random
# set a seed so results will be repeatable
random.seed(456789)
# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)]
def func1(x):
# included just so we get a function call
return True
def func2(x):
# included just so we get a function call
return False
def way1(x):
if x in my_list:
result = func1(x)
else:
result = func2(x)
return result
def way2(x):
if x not in my_list:
result = func2(x)
else:
result = func1(x)
return result
%timeit way1(101) # failure with large list
The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop
%timeit way1(0) # success with large list
The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop
my_list.index(0)
186
%timeit way2(101) # failure with large list
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop
%timeit way2(0) # success with large list
The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop
my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value
-37
# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list
The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop
# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list
The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop
軟件實現中的一個期望特性是具有低耦合 。 您的實現不應該通過Python解釋器測試列表成員資格的方式來定義,因為這是一種高級別的耦合。 可能是實施方式發生了變化,而且不再是更快的方式。
在這種情況下我們應該關注的是,對列表中的成員資格的測試與列表的大小是線性的。 如果需要更快的成員資格測試,您可以使用一組。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.