简体   繁体   English

如何在列表中获取生成器的n个下一个值(python)

[英]How to get the n next values of a generator in a list (python)

I have made a generator to read a file word by word and it works nicely. 我已经制作了一个生成器来逐字读取文件,它运行良好。

def word_reader(file):
    for line in open(file):
        for p in line.split():
            yield p

reader = word_reader('txtfile')
next(reader)

What is the easiest way of getting the n next values in a list? 在列表中获取n个下一个值的最简单方法是什么?

使用itertools.islice

list(itertools.islice(it, n))

EDIT : Use itertools.islice . 编辑 :使用itertools.islice The pattern below that I originally proposed is bad idea — it crashes when it yields less than n values, and this behaviour depends on subtle issues, so people reading such code are unlikely to understand it's precise semantics. 我最初提出的模式是一个坏主意 - 当it产生少于n值时崩溃,并且这种行为取决于微妙的问题,因此阅读这些代码的人不太可能理解它的精确语义。

There is also 还有

 [next(it) for _ in range(n)] 

which might(?) be clearer to people not familiar with itertools; 哪个(?)对于不熟悉itertools的人来说更清楚; but if you deal with iterators a lot, itertools is a worthy addition to your toolset. 但是如果你经常处理迭代器,那么itertools对你的工具集来说是一个有价值的补充。

What happens if next(it) was exhausted and raises StopIteration ? 如果next(it)耗尽并引发StopIteration会发生什么?

(ie when it had less than n values to yield) (即当it值小于n

When I wrote the above line a couple years ago, I probably thought a StopIteration will have the clever side effect of cleanly terminating the list comprehension. 几年前,当我写上面这一行时,我可能认为StopIteration会有一个聪明的副作用,即干净地终止列表理解。 But no, the whole comprehension will crash passing the StopIteration upwards. 但不,整个理解都会通过StopIteration向上崩溃。 (It'd exit cleanly only if the exception originated from the range(n) iterator.) (只有当异常来自range(n)迭代器时,它才会干净地退出。)

Which is probably not the behavior you want. 这可能不是你想要的行为。

But it gets worse. 但它变得更糟。 The following is supposed to be equivalent to the list comprehension (especially on Python 3): 以下应该等同于列表理解(特别是在Python 3上):

list(next(it) for _ in range(n))

It isn't. 事实并非如此。 The inner part is shorthand for a generator function; 内部部分是发电机功能的简写; list() knows it's done when it raises StopIteration anywhere . list()知道它在任何地方引发StopIterationStopIteration完成。
=> This version copes safely when there aren't n values and returns a shorter list. =>当没有n值并返回较短的列表时,此版本可以安全地处理。 (Like itertools.islice() .) (比如itertools.islice() 。)

[Executions on: 2.7 , 3.4 ] [上处决: 2.73.4 ]

But that's too going to change! 但这也会改变! The fact a generator silently exits when any code inside it raises StopIteration is a known wart, addressed by PEP 479 . 当它内部的任何代码引发StopIteration时,发生器默默地退出的事实是一种已知的疣,由PEP 479解决 From Python 3.7 (or 3.5 with a future import) that's going to cause a RuntimeError instead of cleanly finishing the generator. 从Python 3.7(或将来导入的3.5)将导致RuntimeError而不是干净地完成生成器。 Ie it'll become similar to the list comprehension's behaviour. 即它将变得类似于列表理解的行为。 (Tested on a recent HEAD build) (在最近的HEAD版本上测试过)

for word, i in zip(word_reader(file), xrange(n)):
    ...

To get the first n values of a generator, you can use more_itertools.take . 要获取生成器的前n个值,可以使用more_itertools.take

If you plan to iterate over the words in chunks (eg. 100 at a time), you can use more_itertools.chunked ( https://more-itertools.readthedocs.io/en/latest/api.html ): 如果您计划迭代块中的单词(例如,一次100个),则可以使用more_itertools.chunked( https://more-itertools.readthedocs.io/en/latest/api.html ):

import more_itertools
for words in more_itertools.chunked(reader, n=100):
    # process 100 words

Use cytoolz.take . 使用cytoolz.take

>>> from cytoolz import take
>>> list(take(2, [10, 20, 30, 40, 50]))
[10, 20]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM