簡體   English   中英

哪個更快? 檢查某些內容是否在Python列表中? 即會員資格與非會員資格

[英]Which is faster? Checking if something is in a Python list or not? I.e. membership vs non-membership

對於那些比我更了解計算機科學的人來說,這可能是一個菜鳥問題,也可能是顯而易見的。 也許這就是為什么我在搜索之后找不到Google或SO的任何內容。 也許我沒有使用正確的詞匯。

標題說明了一切。 如果我知道x大部分時間都在my_list中,那么以下哪個更快?

if x in my_list:
    func1(x)
else:
    func2(x)

要么

if x not in my_list:
    func2(x)
else:
    func1(x)

列表的大小是否重要? 例如10個元素與10,000個元素? 對於我的特殊情況, my_list由字符串和整數組成,但有沒有人知道其他考慮是否適用於更復雜的類型,如dicts?

謝謝。

檢查元素是否在列表中,或者元素是否x in my_list中調用相同操作x in my_list的列表x in my_list ,因此應該沒有任何區別。

列表的大小是否重要?

檢查元素是否在列表中是O(N)操作,這意味着大小確實很重要,大致成比例。

如果你需要做很多檢查,你可能想查看set ,檢查一個元素是否在一個set是O(1),這意味着檢查時間不會隨着set大小增加而改變太多。

應該沒有明顯的性能差異。 你最好不要寫任何一個讓你的代碼更具可讀性的文章。 任何一個都是O(n)復雜度,並且主要取決於元素在列表中的位置。 此外,您應該避免過早優化,對大多數用例來說無關緊要,如果確實如此,通常最好使用其他數據結構。

如果要以更快的性能進行查找,請使用dicts,它們可能具有O(1)復雜性。 有關詳細信息,請參閱https://wiki.python.org/moin/TimeComplexity

Python包含一個模塊和函數timeit ,它可以告訴你執行代碼片段需要多長時間。 片段必須是單個語句,這樣就不會像if一樣直接計算復合語句if但是我們可以將語句包裝在函數中並為函數調用計時。

比調用timeit.timeit()更容易使用一個jupyter筆記本並在一行的開頭使用魔術%timeit magic語句。

這證明了長期列表或簡短,成功或失敗,您詢問的兩種方式, in alist檢查in alist還是not in alist測量的可變性內給出相同的時間。

import random

# set a seed so results will be repeatable
random.seed(456789)

# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)] 

def func1(x):
    # included just so we get a function call
    return True

def func2(x):
    # included just so we get a function call
    return False

def way1(x):
    if x in my_list:
        result = func1(x)
    else:
        result = func2(x)
    return result

def way2(x):
    if x not in my_list:
        result = func2(x)
    else:
        result = func1(x)
    return result

%timeit way1(101) # failure with large list

The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop

%timeit way1(0) # success with large list

The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop

my_list.index(0)

186

%timeit way2(101) # failure with large list

The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop

%timeit way2(0) # success with large list

The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop

my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value

-37

# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop

# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop

軟件實現中的一個期望特性是具有低耦合 您的實現不應該通過Python解釋器測試列表成員資格的方式來定義,因為這是一種高級別的耦合。 可能是實施方式發生了變化,而且不再是更快的方式。

在這種情況下我們應該關注的是,對列表中的成員資格的測試與列表的大小是線性的。 如果需要更快的成員資格測試,您可以使用一組。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM