[英]How to get the n next values of a generator in a list (python)
I have made a generator to read a file word by word and it works nicely. 我已经制作了一个生成器来逐字读取文件,它运行良好。
def word_reader(file):
for line in open(file):
for p in line.split():
yield p
reader = word_reader('txtfile')
next(reader)
What is the easiest way of getting the n next values in a list? 在列表中获取n个下一个值的最简单方法是什么?
使用itertools.islice
:
list(itertools.islice(it, n))
EDIT : Use itertools.islice
. 编辑 :使用
itertools.islice
。 The pattern below that I originally proposed is bad idea — it crashes when it
yields less than n
values, and this behaviour depends on subtle issues, so people reading such code are unlikely to understand it's precise semantics. 我最初提出的模式是一个坏主意 - 当
it
产生少于n
值时崩溃,并且这种行为取决于微妙的问题,因此阅读这些代码的人不太可能理解它的精确语义。
There is also
还有
[next(it) for _ in range(n)]
which might(?) be clearer to people not familiar with itertools;
哪个(?)对于不熟悉itertools的人来说更清楚; but if you deal with iterators a lot, itertools is a worthy addition to your toolset.
但是如果你经常处理迭代器,那么itertools对你的工具集来说是一个有价值的补充。
next(it)
was exhausted and raises StopIteration
? next(it)
耗尽并引发StopIteration
会发生什么? (ie when it
had less than n
values to yield) (即当
it
值小于n
)
When I wrote the above line a couple years ago, I probably thought a StopIteration
will have the clever side effect of cleanly terminating the list comprehension. 几年前,当我写上面这一行时,我可能认为
StopIteration
会有一个聪明的副作用,即干净地终止列表理解。 But no, the whole comprehension will crash passing the StopIteration
upwards. 但不,整个理解都会通过
StopIteration
向上崩溃。 (It'd exit cleanly only if the exception originated from the range(n)
iterator.) (只有当异常来自
range(n)
迭代器时,它才会干净地退出。)
Which is probably not the behavior you want. 这可能不是你想要的行为。
But it gets worse. 但它变得更糟。 The following is supposed to be equivalent to the list comprehension (especially on Python 3):
以下应该等同于列表理解(特别是在Python 3上):
list(next(it) for _ in range(n))
It isn't. 事实并非如此。 The inner part is shorthand for a generator function;
内部部分是发电机功能的简写;
list()
knows it's done when it raises StopIteration
anywhere . list()
知道它在任何地方引发StopIteration
时StopIteration
完成。
=> This version copes safely when there aren't n
values and returns a shorter list. =>当没有
n
值并返回较短的列表时,此版本可以安全地处理。 (Like itertools.islice()
.) (比如
itertools.islice()
。)
[Executions on: 2.7 , 3.4 ] [上处决: 2.7 , 3.4 ]
But that's too going to change! 但这也会改变! The fact a generator silently exits when any code inside it raises
StopIteration
is a known wart, addressed by PEP 479 . 当它内部的任何代码引发
StopIteration
时,发生器默默地退出的事实是一种已知的疣,由PEP 479解决 。 From Python 3.7 (or 3.5 with a future import) that's going to cause a RuntimeError
instead of cleanly finishing the generator. 从Python 3.7(或将来导入的3.5)将导致
RuntimeError
而不是干净地完成生成器。 Ie it'll become similar to the list comprehension's behaviour. 即它将变得类似于列表理解的行为。 (Tested on a recent HEAD build)
(在最近的HEAD版本上测试过)
for word, i in zip(word_reader(file), xrange(n)):
...
To get the first n values of a generator, you can use more_itertools.take . 要获取生成器的前n个值,可以使用more_itertools.take 。
If you plan to iterate over the words in chunks (eg. 100 at a time), you can use more_itertools.chunked ( https://more-itertools.readthedocs.io/en/latest/api.html ): 如果您计划迭代块中的单词(例如,一次100个),则可以使用more_itertools.chunked( https://more-itertools.readthedocs.io/en/latest/api.html ):
import more_itertools
for words in more_itertools.chunked(reader, n=100):
# process 100 words
Use cytoolz.take . 使用cytoolz.take 。
>>> from cytoolz import take
>>> list(take(2, [10, 20, 30, 40, 50]))
[10, 20]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.