简体   繁体   中英

Read specific number of lines in python

I have the BIG data text file for example:

#01textline1
1 2 3 4 5 6
2 3 5 6 7 3
3 5 6 7 6 4
4 6 7 8 9 9

1 2 3 6 4 7
3 5 7 7 8 4
4 6 6 7 8 5

3 4 5 6 7 8
4 6 7 8 8 9
..
..

You do not need a loop to accomplish your purpose. Just use the index function on the list to get the index of the two lines and take all the lines between them.

Note that I changed your file.readlines() to strip trailing newlines.

(Using file.read().splitlines() can fail, if read() ends in the middle of a line of data.)

file1 = open("data.txt","r")
file2=open("newdata.txt","w")
lines = [ line.rstrip() for line in file1.readlines() ]

firstIndex = lines.index("#02textline2")
secondIndex = lines.index("#03textline3")

print firstIndex, secondIndex
file2.write("\n".join(lines[firstIndex  + 1 : secondIndex]))


file1.close()
file2.close()

There is a line return character at the end of every line, so this:

if line == "#03textline3":

will never be true, as the line is actually "#03textline3\\n" . Why didn't you use the same syntax as the one you used for "#02textline2" ? It would have worked:

if "#03textline3" in line: # Or ' line == "#03textline3\n" '
    break;

Besides, you have to correct your indentation for the always_print = True line.

Here's what I would suggest doing:

firstKey = "#02textline2"
secondKey = "#03textline3"

with open("data.txt","r") as fread:
    for line in fread:
        if line.rstrip() == firstKey:
            break

    with open("newdata.txt","w") as fwrite:
        for line in fread:
            if line.rstrip() == secondKey:
                break
            else:
                fwrite.write(line)

This approach takes advantage of the fact that Python treats files like iterators. The first for loops iterates through the file iterator f until the first key is found. The loop breaks, but the iterator stays as the current position. When it gets picked back up, the second loops starts where the first let off. We then directly write the lines you want to a new file, and discard the rest

Advantages:

  • This does not load the entire file into memory, only the lines between firstKey and secondKey are stored, and only the lines before secondKey are ever read by the script

  • No entries are looked over or processed more than once

  • The context manager with is a safer way to consume files

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM