简体   繁体   English

在python中解析文件以首先找到一个字符串,然后解析以下字符串,直到找到另一个字符串

[英]Parse a file in python to find first a string, then parse the following strings until it find another string

I am trying to scroll trough a result file that one of our process print. 我试图滚动通过我们的过程之一打印结果文件。

The objective is to look through various blocks and find a specific parameter. 目的是浏览各种块并找到特定的参数。 I tried to tackle this but can't find an efficient way that would avoid to parse the file multiple times. 我试图解决这个问题,但是找不到一种避免多次分析文件的有效方法。

This is an example of the output file that I read: 这是我阅读的输出文件的示例:

ID:13123
Compound:xyz
... various parameters
RhPhase:abc

ID:543
Compound:lbm
... various parameters

ID:232355
Compound:dfs
... various parameters
RhPhase:cvb

I am looking for a specific ID that has a RhPhase in it, but since the file contains many more entry, I just want that specific ID. 我正在寻找具有RhPhase的特定ID,但是由于文件包含更多条目,因此我只想要该特定ID。 It may or may not have an RhPhase in it; 它可能有也可能没有RhPhase; if it has one, I get the value. 如果有一个,我就会得到价值。

The only way that I figured out is to actually go through the whole file (which may be hundreds of blocks, to give an idea of the size), and make a list for each ID that has a RhPhase, then in second instance, I scroll through the dictionary, retrieving the value for a specific ID. 我发现的唯一方法是实际遍历整个文件(可能是数百个块,以了解大小),并为每个具有RhPhase的ID列出一个清单,然后在第二个实例中,我滚动浏览字典,检索特定ID的值。

This feels so inefficient; 这感觉效率很低。 I tried to do something different, but got stuck at how you mark the lines while you scroll through them; 我尝试做一些不同的事情,但是在滚动时却陷入了困境。 so I can tell python to read each line->when find the ID that I want continue to read->if you find RhPhase get the value, otherwise stop at the next ID. 所以我可以告诉python读取每一行->当找到我想要继续读取的ID时->如果您发现RhPhase获取该值,否则停在下一个ID处。

I am stuck here: 我被困在这里:

datafile=open("datafile.txt", "r")
for items in datafile.readline():
    if "ID:543" in items:
        [read more lines]
        [if "RhPhase" in lines:]
        [    rhphase=lines     ]
        [elif ""ID:" in lines  ]
        [    rhphase=None      ]
        [    break             ]

Once I find the ID; 找到ID后; I don't know how to continue to either look for the RhPhase string or find the first ID: string and stop everything (because this means that the ID does not have an associated RhPhase). 我不知道如何继续寻找RhPhase字符串或找到第一个ID:string并停止所有操作(因为这意味着该ID没有关联的RhPhase)。

This would pass through the file once, and just check for the specific ID, instead of parse the whole thing once and then do a second pass. 这将通过文件一次,并只检查特定的ID,而不是一次分析整个内容,然后进行第二次传递。 Is possible to do so or am I stuck to the double parsing ? 可能这样做,还是我坚持双重解析?

Usually, you solve these kind of things with a simple state machine: You read the lines until you find your id; 通常,您可以使用一个简单的状态机解决这类问题:读取各行,直到找到您的ID; then you put your reader into a special state that then checks for the parameter you want to extract. 然后将阅读器置于特殊状态,然后检查要提取的参数。 In your case, you only have two states: ID not found, and ID found, so a simple boolean is enough: 在您的情况下,您只有两种状态:未找到ID和已找到ID,因此一个简单的布尔值就足够了:

foundId = False
with open('datafile.txt', 'r') as datafile:
    for line in datafile:
        if foundId:
            if line.startswith('RhPhase'):
                print('Found RhPhase for ID 543:')
                print(line)

                # end reading the file
                break
            elif line.startswith('ID:'):
                print('Error: Found another ID without finding RhPhase first')
                break

        # if we haven’t found the ID yet, keep looking for it
        elif line.startswith('ID:543'):
                foundId = True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在另一个字符串python正则表达式多行之后找到第一个匹配项 - Find first match following another string, python regex multiline 在文本文件中找到一个字符串,然后在Python中打印以下各行的第一个单词 - Find a string in a text file, and then print the first words of the following lines in Python 使用 Python 解析 CSV 以查找特定字符串 - Using Python to parse CSV to find specific string Python String:如何解析字符串并找到特定的字符串索引? - Python String: How to parse string and find specific String index? 使用python将以下查询字符串解析为json - Parse the following query string into json using python Arg解析:将文件名解析为字符串(python) - Arg parse: parse file name as string (python) python:在string中查找第一个字符串 - python: find first string in string python 解析日志文件:在不同的行中找到两个特定的字符串并连接到一个并写入另一个文件! 避免空行 - python parse log file: find two specific strings in different lines and concatenate in one and write to another file! Avoiding blank lines Python - 解析字符串列表格式化的字符串列表 - Python - Parse a list of string formatted list of strings Python - 查找字符串中第一次出现的字符串列表的索引位置 - Python - find index position of first occurrence of a list of strings within a string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM