简体   繁体   English

读取文件 Python 时跳过行块

[英]Skip chunks of lines while reading a file Python

I have a file which consists of curve data repetitively structured as following:我有一个文件,其中包含重复结构如下的曲线数据:

numbersofsamples
Title
     data
     data
     data
      ...

For example:例如:

999numberofsamples
title crvTitle
             0.0            0.866423
    0.0001001073           0.6336382
    0.0002002157           0.1561626
    0.0003000172          -0.1542121
             ...                 ...
1001numberofsamples
title nextCrv
    0.000000e+00        0.000000e+00
    1.001073e-04        1.330026e+03
    2.002157e-04        3.737352e+03
    3.000172e-04        7.578963e+03
             ...                 ...

The file consists of many curves and can be up to 2GB.该文件由许多曲线组成,最大可达 2GB。

My task is to find and export a specific curve by skipping the chunks (curves) that are not interesting for me.我的任务是通过跳过我不感兴趣的块(曲线)来查找和导出特定曲线。 I know the length of the curve (number of samples), so there should be a way to jump to the next delimiter (eg numberofsamples) until I find the title that I need?我知道曲线的长度(样本数),所以应该有一种方法可以跳转到下一个分隔符(例如 numberofsamples),直到找到我需要的标题?

I tried to use an iterator to do that, unfortunately without any success.我试图使用迭代器来做到这一点,不幸的是没有任何成功。 Is that the right way to accomplish the task?这是完成任务的正确方法吗?

If it's possible, I don't want to save the data to the memory.如果可能,我不想将数据保存到内存中。

This is a general way to skip lines you don't care about:这是跳过您不关心的行的一般方法:

for line in file:
    if 'somepattern' not in line:
        continue
    # if we got here, 'somepattern' is in the line, so process it

You don't need to keep all lines in memory.您不需要将所有行都保留在内存中。 Skip to the wanted title, and only save the liens afterwards, you want:跳到想要的标题,然后只保存留置权,你想要:

with open('somefile.txt') as lines
    # skip to title
    for line in lines
        if line == 'title youwant':
            break
    numbers = []
    for line in lines:
        if 'numberofsamples' in line:
            break # next samples
        numbers.append(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM