简体   繁体   English

在python中搜索列表的最快方法

[英]Fastest way to search a list in python

When you do something like "test" in a where a is a list does python do a sequential search on the list or does it create a hash table representation to optimize the lookup? 当你这样做"test" in a ,其中a是一个列表确实蟒蛇做了顺序搜索在名单上,它创建一个哈希表表示,以优化查询? In the application I need this for I'll be doing a lot of lookups on the list so would it be best to do something like b = set(a) and then "test" in b ? 在应用程序中我需要这个,因为我会在列表上进行大量的查找,所以最好做b = set(a)然后"test" in b吗? Also note that the list of values I'll have won't have duplicate data and I don't actually care about the order it's in; 另请注意,我将拥有的值列表不会有重复数据,我实际上并不关心它的顺序; I just need to be able to check for the existence of a value. 我只需要能够检查是否存在值。

Also note that the list of values I'll have won't have duplicate data and I don't actually care about the order it's in; 另请注意,我将拥有的值列表不会有重复数据,我实际上并不关心它的顺序; I just need to be able to check for the existence of a value. 我只需要能够检查是否存在值。

Don't use a list, use a set() instead. 不要使用列表,而是使用set() It has exactly the properties you want, including a blazing fast in test. 它正是你想要的属性,包括一个速度极快in测试。

I've seen speedups of 20x and higher in places (mostly heavy number crunching) where one list was changed for a set. 我已经看到了20倍甚至更高的加速度(大多数是重数字运算),其中一个列表被更改为一组。

"test" in a with a list a will do a linear search. 带有列表a "test" in a将进行线性搜索。 Setting up a hash table on the fly would be much more expensive than a linear search. 动态设置哈希表比线性搜索要昂贵得多。 "test" in b on the other hand will do an amoirtised O(1) hash look-up. 另一方面"test" in b将进行amoirt化O(1)散列查找。

In the case you describe, there doesn't seem to be a reason to use a list over a set. 在您描述的情况下,似乎没有理由在集合上使用列表。

I think it would be better to go with the set implementation. 我认为最好采用set实现。 I know for a fact that sets have O(1) lookup time. 我知道集合有O(1)查找时间。 I think lists take O(n) lookup time. 我认为列表需要O(n)查找时间。 But even if lists are also O(1) lookup, you lose nothing with switching to sets. 但即使列表也是O(1)查找,切换到集合也不会丢失任何内容。

Further, sets don't allow duplicate values. 此外,集合不允许重复值。 This will make your program slightly more memory efficient as well 这将使您的程序稍微提高内存效率

List and tuples seems to have the same time, and using "in" is slow for large data: 列表和元组似乎有相同的时间,使用“in”对大数据来说很慢:

>>> t = list(range(0, 1000000))
>>> a=time.time();x = [b in t for b in range(100234,101234)];print(time.time()-a)
1.66235494614
>>> t = tuple(range(0, 1000000))
>>> a=time.time();x = [b in t for b in range(100234,101234)];print(time.time()-a)
1.6594209671

Here is much better solution: Most efficient way for a lookup/search in a huge list (python) 这是更好的解决方案: 在一个巨大的列表中查找/搜索的最有效方式(python)

It's super fast: 这超级快:

>>> from bisect import bisect_left
>>> t = list(range(0, 1000000))
>>> a=time.time();x = [t[bisect_left(t,b)]==b for b in range(100234,101234)];print(time.time()-a)
0.0054759979248

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM