简体   繁体   中英

Reading a specific number of lines from a file without storing in memory?

I have data that I need to read and extract specific blocks from using a python code but the files are potentially tens of millions of lines long and too large to store in memory so I only want to pull the data that I actually need to analyse.

The files are formatted as follows:

4 # Number of lines per block
0 # Start of block 0
A line of data
A line of data
A line of data
A line of data
1 # Start of block 1
A line of data
A line of data
...

The issue I'm having is that once I find and read the specific block I need into a list, my code continues reading and adding data until the end of the file instead of the end of that specific block.

Here's what I have so far:

required_block = 5
ilepath = file.txt
data = []

with open(filepath, 'r') as f:
    block_length = int(f.readline())
    for line in f:
        block = int(line)
        if block != required_block:
            for _ in range(block_length)
                next(f)
        else:
            break
    for line in f:
        data.append(line)

If I try to add a range to the last 'for' loop it will just read the current line over and over.

Where am I going wrong?

EDIT: To clarify, I only want the last 'for' loop to run < block_length > number of times.

If you look at your code, your last for loop is the culprit. You're telling it to append everything no matter what. In your first for loop, you're not actually having it append anything at all. So essentially in the first loop it just runs through the data, then in the second one it appends everything because the append is outside of the logic.

I think what you want is something like this:

for line in f:
        block = int(line)
        if block != required_block:
            next(f)
        else:
            for _ in range(block_length):
                data.append(line)

Try changing your last loop to this:

for _ in range(block_length):
    data.append(f.readLine())

Reading file line by line:

filepath = 'Iliad.txt'
    with open(filepath) as fp:
       line = fp.readline()
       cnt = 1
       while line:
           print("Line {}: {}".format(cnt, line.strip()))
           line = fp.readline()
           cnt += 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM