Python：itertools.islice无法循环工作

Question

I have code like this: 我有这样的代码：

#opened file f
goto_line = num_lines #Total number of lines
while not found:
   line_str = next(itertools.islice(f, goto_line - 1, goto_line))
   goto_line = goto_line/2
   #checks for data, sets found to True if needed

line_str is correct the first pass, but every pass after that is reading a different line then it should. 第一行的line_str是正确的，但此后的每一遍都将读取不同的行。

So for example, goto_line starts off as 1000. It reads line 1000 just fine. 因此，例如，goto_line以1000开始。它读取1000行就好了。 Then the next loop, goto_line is 500 but it doesn't read line 500. It reads some line closer to 1000. 然后，下一个循环goto_line是500，但不会读取第500行。它读取的行更接近1000。

I'm trying to read specific lines in a large file without reading more than necessary. 我正在尝试读取大文件中的特定行，而不读取多余的内容。 Sometimes it jumps backwards to a line and sometimes forward. 有时它会向后跳到一条线，有时会跳到一条线。

I did try linecache, but I typically don't run this code more than once on the same file. 我确实尝试过线缓存，但通常不会在同一文件上多次运行此代码。

Answer 1

Python iterators can be consumed only once. Python迭代器只能使用一次。 This is easiest seen by example. 通过示例最容易看出这一点。 The following code 以下代码

from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))

prints 版画

[1, 2]
[4, 5]
[7, 8]
[]

The slicing always starts where we stopped last time. 切片总是从上次停止的地方开始。

The easiest way to make your code work is to use the f.readlines() to get a list of the lines in the file and then use normal Python list slicing [i:j] . 使代码工作最简单的方法是使用f.readlines()获取文件中的行列表，然后使用常规的Python列表切片[i:j] 。 If you really want to use islice() , you could start reading the file from the beginning each time by using f.seek(0) , but this will be very inefficient. 如果您确实想使用islice() ，则可以每次使用f.seek(0)从头开始读取文件，但这效率非常低。

Answer 2

You cannot (this way - perhaps there is some way depending on how the file is opened) go back in the file. 您不能（通过这种方式-可能有某种方式取决于文件的打开方式）返回文件。 The standard file iterator (in fact, most iterators - Python's iterator protocol only supports forward iterators) moves only forward. 标准文件迭代器（实际上，大多数迭代器-Python的迭代器协议仅支持正向迭代器）仅向前移动。 So after reading k lines, reading another k/2 lines actually gives the k+k/2 th line. 因此，在读取了k条线之后，再读取另外k/2条线实际上就是第k+k/2条线。

You could try reading the whole file into memory, but you have a lot of data so memory consumption propably becomes an issue. 您可以尝试将整个文件读取到内存中，但是您有很多数据，因此内存消耗可能成为问题。 You could use file.seek to scroll through the file. 您可以使用file.seek滚动浏览文件。 But that's still a lot of work - perhaps you could use a memory-mapped file ? 但这仍然是很多工作-也许您可以使用内存映射文件？ That's only possible if lines are fixed-size though. 但是，只有在行大小固定的情况下才有可能。 If it's necessary, you could pre-calculate the line numbers you'd like to check and save all those lines (shouldn't be too much, roughly int(log_2(line_count)) + 1 if I'm not mistaken) in one iteration so you don't have to scroll back after reading the whole file. 如果有必要，您可以预先计算要检查的行号，然后将所有这些行保存（不要太多，如果我没记错的话，大致应为int(log_2(line_count)) + 1 ）。迭代，因此您不必在读取整个文件后向后滚动。

Python：itertools.islice无法循环工作

问题描述

2 个解决方案

解决方案1
5 已采纳 2011-02-16 18:32:54

解决方案2
0

Python：itertools.islice无法循环工作

问题描述

2 个解决方案

解决方案1 5 已采纳 2011-02-16 18:32:54

解决方案2 0

解决方案1
5 已采纳 2011-02-16 18:32:54

解决方案2
0