[英]Parse a file in python to find first a string, then parse the following strings until it find another string
I am trying to scroll trough a result file that one of our process print. 我试图滚动通过我们的过程之一打印结果文件。
The objective is to look through various blocks and find a specific parameter. 目的是浏览各种块并找到特定的参数。 I tried to tackle this but can't find an efficient way that would avoid to parse the file multiple times.
我试图解决这个问题,但是找不到一种避免多次分析文件的有效方法。
This is an example of the output file that I read: 这是我阅读的输出文件的示例:
ID:13123
Compound:xyz
... various parameters
RhPhase:abc
ID:543
Compound:lbm
... various parameters
ID:232355
Compound:dfs
... various parameters
RhPhase:cvb
I am looking for a specific ID that has a RhPhase in it, but since the file contains many more entry, I just want that specific ID. 我正在寻找具有RhPhase的特定ID,但是由于文件包含更多条目,因此我只想要该特定ID。 It may or may not have an RhPhase in it;
它可能有也可能没有RhPhase; if it has one, I get the value.
如果有一个,我就会得到价值。
The only way that I figured out is to actually go through the whole file (which may be hundreds of blocks, to give an idea of the size), and make a list for each ID that has a RhPhase, then in second instance, I scroll through the dictionary, retrieving the value for a specific ID. 我发现的唯一方法是实际遍历整个文件(可能是数百个块,以了解大小),并为每个具有RhPhase的ID列出一个清单,然后在第二个实例中,我滚动浏览字典,检索特定ID的值。
This feels so inefficient; 这感觉效率很低。 I tried to do something different, but got stuck at how you mark the lines while you scroll through them;
我尝试做一些不同的事情,但是在滚动时却陷入了困境。 so I can tell python to read each line->when find the ID that I want continue to read->if you find RhPhase get the value, otherwise stop at the next ID.
所以我可以告诉python读取每一行->当找到我想要继续读取的ID时->如果您发现RhPhase获取该值,否则停在下一个ID处。
I am stuck here: 我被困在这里:
datafile=open("datafile.txt", "r")
for items in datafile.readline():
if "ID:543" in items:
[read more lines]
[if "RhPhase" in lines:]
[ rhphase=lines ]
[elif ""ID:" in lines ]
[ rhphase=None ]
[ break ]
Once I find the ID; 找到ID后; I don't know how to continue to either look for the RhPhase string or find the first ID: string and stop everything (because this means that the ID does not have an associated RhPhase).
我不知道如何继续寻找RhPhase字符串或找到第一个ID:string并停止所有操作(因为这意味着该ID没有关联的RhPhase)。
This would pass through the file once, and just check for the specific ID, instead of parse the whole thing once and then do a second pass. 这将通过文件一次,并只检查特定的ID,而不是一次分析整个内容,然后进行第二次传递。 Is possible to do so or am I stuck to the double parsing ?
可能这样做,还是我坚持双重解析?
Usually, you solve these kind of things with a simple state machine: You read the lines until you find your id; 通常,您可以使用一个简单的状态机解决这类问题:读取各行,直到找到您的ID; then you put your reader into a special state that then checks for the parameter you want to extract.
然后将阅读器置于特殊状态,然后检查要提取的参数。 In your case, you only have two states: ID not found, and ID found, so a simple boolean is enough:
在您的情况下,您只有两种状态:未找到ID和已找到ID,因此一个简单的布尔值就足够了:
foundId = False
with open('datafile.txt', 'r') as datafile:
for line in datafile:
if foundId:
if line.startswith('RhPhase'):
print('Found RhPhase for ID 543:')
print(line)
# end reading the file
break
elif line.startswith('ID:'):
print('Error: Found another ID without finding RhPhase first')
break
# if we haven’t found the ID yet, keep looking for it
elif line.startswith('ID:543'):
foundId = True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.