在python中解析文件以首先找到一个字符串，然后解析以下字符串，直到找到另一个字符串

Question

I am trying to scroll trough a result file that one of our process print. 我试图滚动通过我们的过程之一打印结果文件。

The objective is to look through various blocks and find a specific parameter. 目的是浏览各种块并找到特定的参数。 I tried to tackle this but can't find an efficient way that would avoid to parse the file multiple times. 我试图解决这个问题，但是找不到一种避免多次分析文件的有效方法。

This is an example of the output file that I read: 这是我阅读的输出文件的示例：

ID:13123
Compound:xyz
... various parameters
RhPhase:abc

ID:543
Compound:lbm
... various parameters

ID:232355
Compound:dfs
... various parameters
RhPhase:cvb

I am looking for a specific ID that has a RhPhase in it, but since the file contains many more entry, I just want that specific ID. 我正在寻找具有RhPhase的特定ID，但是由于文件包含更多条目，因此我只想要该特定ID。 It may or may not have an RhPhase in it; 它可能有也可能没有RhPhase； if it has one, I get the value. 如果有一个，我就会得到价值。

The only way that I figured out is to actually go through the whole file (which may be hundreds of blocks, to give an idea of the size), and make a list for each ID that has a RhPhase, then in second instance, I scroll through the dictionary, retrieving the value for a specific ID. 我发现的唯一方法是实际遍历整个文件（可能是数百个块，以了解大小），并为每个具有RhPhase的ID列出一个清单，然后在第二个实例中，我滚动浏览字典，检索特定ID的值。

This feels so inefficient; 这感觉效率很低。 I tried to do something different, but got stuck at how you mark the lines while you scroll through them; 我尝试做一些不同的事情，但是在滚动时却陷入了困境。 so I can tell python to read each line->when find the ID that I want continue to read->if you find RhPhase get the value, otherwise stop at the next ID. 所以我可以告诉python读取每一行->当找到我想要继续读取的ID时->如果您发现RhPhase获取该值，否则停在下一个ID处。

I am stuck here: 我被困在这里：

datafile=open("datafile.txt", "r")
for items in datafile.readline():
    if "ID:543" in items:
        [read more lines]
        [if "RhPhase" in lines:]
        [    rhphase=lines     ]
        [elif ""ID:" in lines  ]
        [    rhphase=None      ]
        [    break             ]

Once I find the ID; 找到ID后； I don't know how to continue to either look for the RhPhase string or find the first ID: string and stop everything (because this means that the ID does not have an associated RhPhase). 我不知道如何继续寻找RhPhase字符串或找到第一个ID：string并停止所有操作（因为这意味着该ID没有关联的RhPhase）。

This would pass through the file once, and just check for the specific ID, instead of parse the whole thing once and then do a second pass. 这将通过文件一次，并只检查特定的ID，而不是一次分析整个内容，然后进行第二次传递。 Is possible to do so or am I stuck to the double parsing ? 可能这样做，还是我坚持双重解析？

Answer 1

Usually, you solve these kind of things with a simple state machine: You read the lines until you find your id; 通常，您可以使用一个简单的状态机解决这类问题：读取各行，直到找到您的ID； then you put your reader into a special state that then checks for the parameter you want to extract. 然后将阅读器置于特殊状态，然后检查要提取的参数。 In your case, you only have two states: ID not found, and ID found, so a simple boolean is enough: 在您的情况下，您只有两种状态：未找到ID和已找到ID，因此一个简单的布尔值就足够了：

foundId = False
with open('datafile.txt', 'r') as datafile:
    for line in datafile:
        if foundId:
            if line.startswith('RhPhase'):
                print('Found RhPhase for ID 543:')
                print(line)

                # end reading the file
                break
            elif line.startswith('ID:'):
                print('Error: Found another ID without finding RhPhase first')
                break

        # if we haven’t found the ID yet, keep looking for it
        elif line.startswith('ID:543'):
                foundId = True

在python中解析文件以首先找到一个字符串，然后解析以下字符串，直到找到另一个字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-08-03 09:34:29

在python中解析文件以首先找到一个字符串，然后解析以下字符串，直到找到另一个字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-08-03 09:34:29

解决方案1
1 已采纳 2015-08-03 09:34:29