简体   繁体   English

Python:遍历子列表

[英]Python: iterate over a sublist

Generally, when you want to iterate over a portion of a list in Python, the easiest thing to do is just slice the list. 通常,当您要遍历Python中列表的一部分时,最简单的操作就是对列表进行切片。

# Iterate over everything except the first item in a list
#
items = [1,2,3,4]
iterrange = (x for x in items[1:])

But the slice operator creates a new list, which is not even necessary to do in many cases. 但是分片运算符会创建一个新列表,在许多情况下甚至不需要这样做。 Ideally, I'd like some kind of slicing function that creates generators, as opposed to new list objects. 理想情况下,我想要某种切片函数来创建生成器,而不是新的列表对象。 Something similar to this could be accomplished by creating a generator expression that uses a range to return only certain portions of the list: 可以通过创建使用range仅返回列表中某些部分的生成器表达式来实现与此类似的操作:

# Create a generator expression that returns everything except 
# the first item in the list
#
iterrange = (x for x, idx in zip(items, range(0, len(items))) if idx != 0)

But this is sort of cumbersome. 但这有点麻烦。 I'm wondering if there is a better, more elegant way to do this. 我想知道是否有更好,更优雅的方法来做到这一点。 So, what's the easiest way to slice a list so that a generator expression is created instead of a new list object? 因此,切片列表以创建生成器表达式而不是新列表对象的最简单方法是什么?

Use itertools.islice : 使用itertools.islice

import itertools

l = range(20)

for i in itertools.islice(l,10,15):
    print i

10
11
12
13
14

From the doc: 从文档中:

Make an iterator that returns selected elements from the iterable 使一个迭代器返回可迭代对象中的选定元素

Before I start, to be clear, the correct order of selecting between slicing approaches is usually: 明确地说,在开始之前,在切片方法之间进行选择的正确顺序通常是:

  1. Use regular slicing (the cost of copying all but the longest of inputs is usually not meaningful, and the code is much simpler). 使用常规切片 (复制除最长输入之外的所有内容的成本通常没有意义,并且代码要简单得多)。 If the input might not be a sliceable sequence type, convert it to one, then slice, eg allbutone = list(someiterable)[1:] . 如果输入可能不是可切片的序列类型,请将其转换为1,然后进行切片,例如allbutone = list(someiterable)[1:] This is simpler, and for most cases, typically faster, than any other approach. 与任何其他方法相比,这更简单,并且在大多数情况下通常更快。
  2. If regular slicing isn't viable (the input isn't guaranteed to be a sequence and converting to a sequence before slicing might cause memory issues, or it's huge and the slice covers most of it, eg skipping the first 1000 and last 1000 elements of a 10M element list , so memory might be a concern), itertools.islice is usually the correct solution as it's simple enough, and the performance cost is usually unimportant. 如果常规切片不可行 (不能保证输入是序列,并且在切片之前转换为序列可能会导致内存问题,或者它很大且切片覆盖了大部分内容,例如,跳过前1000个和后1000个元素(10M元素list ,因此可能要考虑内存), itertools.islice通常是正确的解决方案,因为它足够简单,并且性能成本通常并不重要。
  3. If, and only if, islice 's performance is unacceptably slow (it adds some overhead to producing every item, though admittedly it's quite a small amount) and the amount of data to be skipped is small, while the data to be included is huge (eg the OP's scenario of skipping a single element and keeping the rest), keep reading 当且仅当islice的性能慢得令人无法接受(它为生产每件商品增加了一些开销,尽管可以承认这是相当小的数量) 并且要跳过的数据量很小,而要包含的数据却很大(例如,OP跳过单个元素并保留其余元素的场景),请继续阅读

If you find yourself in case #3, you're in a scenario where islice 's ability to bypass initial elements (relatively) quickly isn't enough to make up for the incremental cost to produce the rest of the elements. 如果发现自己在情况3中,则处于islice无法(相对)快速绕过初始元素的能力不足以弥补生产其余元素的增量成本的情况。 In that case, you can improve performance by reversing your problem from selecting all elements after n to discarding all elements before n . 在这种情况下,您可以通过将问题从选择 n 之后的所有元素转向废弃 n 之前的所有元素来改善性能。

For this approach, you manually convert your input to an iterator, then explicitly pull out and discard n values, then iterate what's left in the iterator (but without the per-element overhead of islice ). 对于这种方法,您可以将输入手动转换为迭代器,然后显式拉出并丢弃n值,然后迭代迭代器中剩余的内容(但不包含islice的每个元素的开销)。 For example, for an input of myinput = list(range(1, 10000)) , your options for selecting elements 1 through the end are: 例如,对于myinput = list(range(1, 10000)) ,选择元素1到结尾的选项是:

# Approach 1, OP's approach, simple slice:
for x in myinput[1:]:

# Approach 2, Sebastian's approach, using itertools.islice:
for x in islice(myinput, 1, None):

# Approach 3 (my approach)
myiter = iter(myinput)  # Explicitly create iterator from input (looping does this already)
next(myiter, None) # Throw away one element, providing None default to avoid StopIteration error
for x in myiter:  # Iterate unwrapped iterator

If the number of elements to discard is larger, it's probably best to borrow the consume recipe from the itertools docs : 如果要丢弃的元素数量较大,则最好itertools docs借用consume配方

def consume(iterator, n=None):
    "Advance the iterator n-steps ahead. If n is None, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

which makes the approaches generalize for skipping n elements to: 这使得这些方法普遍适用于跳过n元素以:

# Approach 1, OP's approach, simple slice:
for x in myinput[n:]:

# Approach 2, Sebastian's approach, using itertools.islice:
for x in islice(myinput, n, None):

# Approach 3 (my approach)
myiter = iter(myinput)  # Explicitly create iterator from input (looping does this already)
consume(myiter, n)      # Throw away n elements
# Or inlined consume as next(islice(myiter, n, n), None)
for x in myiter:        # Iterate unwrapped iterator

Performance-wise, this wins by a meaningful amount for most large inputs (exception: range itself on Python 3 is already optimized for plain slicing; plain slicing can't be beat on actual range objects). 在性能方面,这对于大多数大型输入来说是一笔可观的收益(例外:Python 3上的range本身已经针对普通切片进行了优化;普通切片无法在实际range对象上被击败)。 ipython3 microbenchmarks (on CPython 3.6, 64 bit Linux build) illustrate this (the definition of slurp in the setup is just a way to make the lowest overhead approach to running out an iterable so we minimize the impact of the stuff we're not interested in): ipython3微基准(上CPython的3.6,64位Linux版本)说明这(的定义slurp的设置只是一种方法,使开销最低的方法来运行了一个迭代,所以我们尽量减少的东西我们没兴趣的影响在):

>>> from itertools import islice
>>> from collections import deque
>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... slurp(r[1:])
...
65.8 μs ± 109 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... slurp(islice(r, 1, None))
...
70.7 μs ± 104 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... ir = iter(r)
... next(islice(ir, 1, 1), None)  # Inlined consume for simplicity, but with islice wrapping to show generalized usage
... slurp(ir)
...
30.3 μs ± 64.1 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

Obviously, the extra complexity of my solution isn't usually going to be worth it, but for moderate sized inputs (10K elements in this case), the performance benefit is clear; 显然,我的解决方案的额外复杂性通常不值得,但是对于中等大小的输入(在这种情况下为10K元素),性能优势是显而易见的。 islice was the worst performer (by a small amount), plain slicing was slightly better (which reinforces my point about plain slicing almost always being the best solution when you have an actual sequence), and the "convert to iterator, discard initial, use rest" approach won by a huge amount, relatively speaking (well under half the time of either of the under solutions). islice的性能最差(少量),纯切片的效果稍好(这强化了我的观点,即在有实际序列时,纯切片几乎始终是最佳解决方案),并且“转换为迭代器,丢弃初始值,使用相对而言,“休息”方法获得了巨大的成功(远低于任一欠解决方案的一半的时间)。

That benefit won't show up for tiny inputs, because the fixed overhead of loading/calling iter / next , and especially islice , will outweigh the savings: 对于微小的输入,不会显示出这种好处,因为加载/调用iter / next ,尤其是islice的固定开销将超过节省的成本:

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... slurp(r[1:])
...
207 ns ± 1.86 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... slurp(islice(r, 1, None))
...
307 ns ± 1.71 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... ir = iter(r)
... next(islice(ir, 1, 1), None)  # Inlined consume for simplicity, but with islice wrapping to show generalized usage
... slurp(ir)
...
518 ns ± 4.5 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... ir = iter(r)
... next(ir, None)  # To show fixed overhead of islice, use next without it
... slurp(ir)
...
341 ns ± 0.947 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

but as you can see, even for 10 elements the islice -free approach isn't much worse; 但是正如您所看到的,即使对于10个元素,无islice方法也不会差很多 by 100 elements, the islice -free approach is faster than all competitors, and by 200 elements, the generalized next + islice beats all competitors (obviously it doesn't beat islice -free given the 180 ns overhead of islice , but that's made up for by generalizing to skipping n elements as a single step, rather than needing to call next repeatedly for skipping more than one element). 每100个元素,无islice方法比所有竞争者都快,而通过200个元素,广义的next + islice击败了所有竞争对手(显然,鉴于islice开销为180 ns,它不会超过islice islice ,但这构成了通过将n元素作为一个步骤进行概括,而不是需要重复调​​用next来跳过一个以上的元素。 Plain islice rarely wins in the "skip a few, keep a lot" case due to the per element overhead the wrapper exacts (it didn't clearly beat eager slicing in the microbenchmarks until around 100K elements; it's memory efficient, but CPU inefficient), and it will do even worse (relative to eager slicing) in the "skip a lot, keep a few" case. 普通的islice很少在“跳过几个,保持很多”的情况下胜出,因为包装程序需要精确地逐个元素开销(直到大约100K个元素,它才明显胜过islice切分;内存效率高,但CPU效率低) ,并且在“跳过很多,保留一些”的情况下,情况甚至会更糟(相对于急切的情况)。

Try itertools.islice: 试试itertools.islice:

http://docs.python.org/library/itertools.html#itertools.islice http://docs.python.org/library/itertools.html#itertools.islice

iterrange = itertools.islice(items, 1, None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM