简体   繁体   中英

Python: itertools.islice not working in a loop

I have code like this:

#opened file f
goto_line = num_lines #Total number of lines
while not found:
   line_str = next(itertools.islice(f, goto_line - 1, goto_line))
   goto_line = goto_line/2
   #checks for data, sets found to True if needed

line_str is correct the first pass, but every pass after that is reading a different line then it should.

So for example, goto_line starts off as 1000. It reads line 1000 just fine. Then the next loop, goto_line is 500 but it doesn't read line 500. It reads some line closer to 1000.

I'm trying to read specific lines in a large file without reading more than necessary. Sometimes it jumps backwards to a line and sometimes forward.

I did try linecache, but I typically don't run this code more than once on the same file.

Python iterators can be consumed only once. This is easiest seen by example. The following code

from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))

prints

[1, 2]
[4, 5]
[7, 8]
[]

The slicing always starts where we stopped last time.

The easiest way to make your code work is to use the f.readlines() to get a list of the lines in the file and then use normal Python list slicing [i:j] . If you really want to use islice() , you could start reading the file from the beginning each time by using f.seek(0) , but this will be very inefficient.

You cannot (this way - perhaps there is some way depending on how the file is opened) go back in the file. The standard file iterator (in fact, most iterators - Python's iterator protocol only supports forward iterators) moves only forward. So after reading k lines, reading another k/2 lines actually gives the k+k/2 th line.

You could try reading the whole file into memory, but you have a lot of data so memory consumption propably becomes an issue. You could use file.seek to scroll through the file. But that's still a lot of work - perhaps you could use a memory-mapped file ? That's only possible if lines are fixed-size though. If it's necessary, you could pre-calculate the line numbers you'd like to check and save all those lines (shouldn't be too much, roughly int(log_2(line_count)) + 1 if I'm not mistaken) in one iteration so you don't have to scroll back after reading the whole file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM