[英]What is the most efficient way to search nested lists in python?
I have a list that contains nested lists and I need to know the most efficient way to search within those nested lists. 我有一个包含嵌套列表的列表,我需要知道在这些嵌套列表中搜索的最有效方法。
eg, if I have 例如,如果我有
[['a','b','c'],
['d','e','f']]
and I have to search the entire list above, what is the most efficient way to find 'd'? 我必须搜索上面的整个列表,找到'd'的最有效方法是什么?
>>> lis=[['a','b','c'],['d','e','f']]
>>> any('d' in x for x in lis)
True
generator expression using any
生成器表达使用
any
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "any('d' in x for x in lis)"
1000000 loops, best of 3: 1.32 usec per loop
generator expression 发电机表达
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 1.56 usec per loop
list comprehension 列表理解
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 3.23 usec per loop
How about if the item is near the end, or not present at all? 如果物品接近结束或根本不存在怎么样?
any
is faster than the list comprehension any
比列表理解更快
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
"'NOT THERE' in [y for x in lis for y in x]"
100000 loops, best of 3: 4.4 usec per loop
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
"any('NOT THERE' in x for x in lis)"
100000 loops, best of 3: 3.06 usec per loop
Perhaps if the list is 1000 times longer? 也许如果列表长1000倍?
any
is still faster any
仍然更快
$ python -m timeit -s "lis=1000*[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
"'NOT THERE' in [y for x in lis for y in x]"
100 loops, best of 3: 3.74 msec per loop
$ python -m timeit -s "lis=1000*[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
"any('NOT THERE' in x for x in lis)"
100 loops, best of 3: 2.48 msec per loop
We know that generators take a while to set up, so the best chance for the LC to win is a very short list 我们知道发电机需要一段时间来设置,因此LC获胜的最佳机会是一个非常短的列表
$ python -m timeit -s "lis=[['a','b','c']]"
"any('c' in x for x in lis)"
1000000 loops, best of 3: 1.12 usec per loop
$ python -m timeit -s "lis=[['a','b','c']]"
"'c' in [y for x in lis for y in x]"
1000000 loops, best of 3: 0.611 usec per loop
And any
uses less memory too 而且
any
使用都会减少内存
Using list comprehension , given: 使用列表理解 ,给出:
mylist = [['a','b','c'],['d','e','f']]
'd' in [j for i in mylist for j in i]
yields: 收益率:
True
and this could also be done with a generator (as shown by @AshwiniChaudhary) 这也可以用发电机完成(如@AshwiniChaudhary所示)
Update based on comment below: 根据以下评论进行更新:
Here is the same list comprehension, but using more descriptive variable names: 这是相同的列表理解,但使用更多描述性的变量名称:
'd' in [elem for sublist in mylist for elem in sublist]
The looping constructs in the list comprehension part is equivalent to 列表推导部分中的循环结构等同于
for sublist in mylist:
for elem in sublist
and generates a list that where 'd' can be tested against with the in
operator. 并生成一个列表,其中'd'可以使用
in
运算符进行测试。
Use a generator expression, here the whole list will not be traversed as generator generate results one by one: 使用生成器表达式,这里不会遍历整个列表,因为生成器逐个生成结果:
>>> lis = [['a','b','c'],['d','e','f']]
>>> 'd' in (y for x in lis for y in x)
True
>>> gen = (y for x in lis for y in x)
>>> 'd' in gen
True
>>> list(gen)
['e', 'f']
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 2.96 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 7.4 usec per loop
If your arrays are always sorted as you show, so that a[i][j] <= a[i][j+1]
and a[i][-1] <= a[i+1][0]
(the last element of one array is always less than or equal to the first element in the next array), then you can eliminate a lot of comparisons by doing something like: 如果您的数组总是在显示时排序,那么
a[i][j] <= a[i][j+1]
和a[i][-1] <= a[i+1][0]
(一个数组的最后一个元素总是小于或等于下一个数组中的第一个元素),那么你可以通过执行以下操作来消除大量的比较:
a = # your big array
previous = None
for subarray in a:
# In this case, since the subarrays are sorted, we know it's not in
# the current subarray, and must be in the previous one
if a[0] > theValue:
break
# Otherwise, we keep track of the last array we looked at
else:
previous = subarray
return (theValue in previous) if previous else False
This kind of optimization is only worthwhile if you have a lot of arrays and they all have a lot of elements though. 如果你有很多数组并且它们都有很多元素,那么这种优化是值得的。
if you just want to know that your element is there in the list or not then you can do this by converting list to string and check it. 如果您只是想知道您的元素是否在列表中,那么您可以通过将list转换为字符串并检查它来完成此操作。 you can extend this of more nested list .
你可以扩展这个更嵌套的列表。 like [[1],'a','b','d',['a','b',['c',1]]] this method is helpful iff you dont know that level of nested list and want to know that is the searchable item is there or not.
比如[[1],'a','b','d',['a','b',['c',1]]]如果您不知道嵌套列表的级别,这个方法很有用想知道那是可搜索的项目是否存在。
search='d'
lis = [['a',['b'],'c'],[['d'],'e','f']]
print(search in str(lis))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.