Skip chunks of lines while reading a file Python

Question

I have a file which consists of curve data repetitively structured as following:

numbersofsamples
Title
     data
     data
     data
      ...

For example:

999numberofsamples
title crvTitle
             0.0            0.866423
    0.0001001073           0.6336382
    0.0002002157           0.1561626
    0.0003000172          -0.1542121
             ...                 ...
1001numberofsamples
title nextCrv
    0.000000e+00        0.000000e+00
    1.001073e-04        1.330026e+03
    2.002157e-04        3.737352e+03
    3.000172e-04        7.578963e+03
             ...                 ...

The file consists of many curves and can be up to 2GB.

My task is to find and export a specific curve by skipping the chunks (curves) that are not interesting for me. I know the length of the curve (number of samples), so there should be a way to jump to the next delimiter (eg numberofsamples) until I find the title that I need?

I tried to use an iterator to do that, unfortunately without any success. Is that the right way to accomplish the task?

If it's possible, I don't want to save the data to the memory.

Answer 1

This is a general way to skip lines you don't care about:

for line in file:
    if 'somepattern' not in line:
        continue
    # if we got here, 'somepattern' is in the line, so process it

Answer 2

You don't need to keep all lines in memory. Skip to the wanted title, and only save the liens afterwards, you want:

with open('somefile.txt') as lines
    # skip to title
    for line in lines
        if line == 'title youwant':
            break
    numbers = []
    for line in lines:
        if 'numberofsamples' in line:
            break # next samples
        numbers.append(line)

Skip chunks of lines while reading a file Python

Question

2 answers

solution1
1 2018-11-03 18:04:43

solution2
0 2018-11-03 18:15:58

Skip chunks of lines while reading a file Python

Question

2 answers

solution1 1 2018-11-03 18:04:43

solution2 0 2018-11-03 18:15:58

solution1
1 2018-11-03 18:04:43

solution2
0 2018-11-03 18:15:58