简体   繁体   English

哪个更快? 检查某些内容是否在Python列表中? 即会员资格与非会员资格

[英]Which is faster? Checking if something is in a Python list or not? I.e. membership vs non-membership

this might be a noob question or blindingly obvious to those who understand more computer science than I do. 对于那些比我更了解计算机科学的人来说,这可能是一个菜鸟问题,也可能是显而易见的。 Perhaps that is why I could not find anything from Google or SO after some searching. 也许这就是为什么我在搜索之后找不到Google或SO的任何内容。 Maybe I'm not using the right vocabulary. 也许我没有使用正确的词汇。

The title says it all. 标题说明了一切。 If I know that x is in my_list most of the time, which of the following is faster? 如果我知道x大部分时间都在my_list中,那么以下哪个更快?

if x in my_list:
    func1(x)
else:
    func2(x)

Or 要么

if x not in my_list:
    func2(x)
else:
    func1(x)

Does the size of the list matter? 列表的大小是否重要? Eg ten elements vs 10,000 elements? 例如10个元素与10,000个元素? For my particular case my_list consists of strings and integers, but does anyone have any idea if other considerations apply to more complicated types such as dicts? 对于我的特殊情况, my_list由字符串和整数组成,但有没有人知道其他考虑是否适用于更复杂的类型,如dicts?

Thank you. 谢谢。

Checking if element is in a list or if element is not in a list calling the same operation x in my_list , so there should not be any difference. 检查元素是否在列表中,或者元素是否x in my_list中调用相同操作x in my_list的列表x in my_list ,因此应该没有任何区别。

Does the size of the list matter? 列表的大小是否重要?

Checking if element is in a list is an O(N) operation, this means that the size does matter, roughly proportionately. 检查元素是否在列表中是O(N)操作,这意味着大小确实很重要,大致成比例。

If you need to do checking a lot, you probably want to look into set , checking if an element is in a set is O(1), this means that checking time does not change much as size of set increases. 如果你需要做很多检查,你可能想查看set ,检查一个元素是否在一个set是O(1),这意味着检查时间不会随着set大小增加而改变太多。

There should be no noticeable performance difference. 应该没有明显的性能差异。 You are better off writing whichever one makes your code more readable. 你最好不要写任何一个让你的代码更具可读性的文章。 Either one will be O(n) complexity, and will mostly depend where the element is located in the list. 任何一个都是O(n)复杂度,并且主要取决于元素在列表中的位置。 Also you should avoid optimizing prematurely, it doesn't matter for most use cases, and when it does, you are usually better off using other data structures. 此外,您应该避免过早优化,对大多数用例来说无关紧要,如果确实如此,通常最好使用其他数据结构。

If you want to lookups with faster performance, use dicts, they are likely to have O(1) complexity. 如果要以更快的性能进行查找,请使用dicts,它们可能具有O(1)复杂性。 For details refer to https://wiki.python.org/moin/TimeComplexity . 有关详细信息,请参阅https://wiki.python.org/moin/TimeComplexity

Python includes a module and function timeit that can tell you how long a snippet of code takes to execute. Python包含一个模块和函数timeit ,它可以告诉你执行代码片段需要多长时间。 The snippet must be a single statement, which leaves out directly timing a compound statement like an if but we can wrap your statements in a function and time the function call. 片段必须是单个语句,这样就不会像if一样直接计算复合语句if但是我们可以将语句包装在函数中并为函数调用计时。

Even easier than calling timeit.timeit() is using a jupyter notebook and using the magic %timeit magic statement at the beginning of a line. 比调用timeit.timeit()更容易使用一个jupyter笔记本并在一行的开头使用魔术%timeit magic语句。

This proves that long list or short, succeeding or failing, the two ways you ask about, checking in alist or not in alist , give timings that are the same within the variability of measurement. 这证明了长期列表或简短,成功或失败,您询问的两种方式, in alist检查in alist还是not in alist测量的可变性内给出相同的时间。

import random

# set a seed so results will be repeatable
random.seed(456789)

# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)] 

def func1(x):
    # included just so we get a function call
    return True

def func2(x):
    # included just so we get a function call
    return False

def way1(x):
    if x in my_list:
        result = func1(x)
    else:
        result = func2(x)
    return result

def way2(x):
    if x not in my_list:
        result = func2(x)
    else:
        result = func1(x)
    return result

%timeit way1(101) # failure with large list

The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop

%timeit way1(0) # success with large list

The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop

my_list.index(0)

186

%timeit way2(101) # failure with large list

The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop

%timeit way2(0) # success with large list

The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop

my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value

-37

# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop

# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop

One desired characteristic in software implementations is to have low coupling . 软件实现中的一个期望特性是具有低耦合 Your implementation should not be defined by the way your Python interpreter tests for list membership, as that is a high level of coupling. 您的实现不应该通过Python解释器测试列表成员资格的方式来定义,因为这是一种高级别的耦合。 It could be that the implementation changes and it is no longer the faster way. 可能是实施方式发生了变化,而且不再是更快的方式。

All that we should care about in this case is that testing for membership in a list is linear on the size of the list. 在这种情况下我们应该关注的是,对列表中的成员资格的测试与列表的大小是线性的。 If faster membership testing is desired you could use a set. 如果需要更快的成员资格测试,您可以使用一组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM