如何在python中提取两个分隔符之间的文本？

Question

首先，我是 python 新手，我搜索了类似的问题，但 xould 不是我想要的。 所以我请原谅:-)

所以，我的问题是：

有一个日志文件，其中包含几个“显示运行”输出以及它们之间的一些其他不相关的详细信息。 我只想提取“显示运行”和“结束”分隔符之间的显示运行细节

我已经设法得到这样一个块但不幸的是我只能得到第一个这样的块（工作）但这不是我想要的。 我想要的是提取我的分隔符“显示运行”和“结束”（失败）之间的最后一个块。

我的工作脚本（工作）如下

打开日志文件，逐行读取
当遇到“show run”时，break
搜索最后一个闭合分隔符“end”的匹配模式
只要没有遇到“结束”，就打印该行。 这样，脚本将打印“show run”之后的行，直到找到“end”之前的行，主要是打印 show run 输出块。
然后脚本退出。

（faiing 部分）我想包括一个计数器来计算找到“显示运行”的次数。 假设计数器 = 3。所以我想要的打印应该从第 3 次“显示运行”开始。

到目前为止，我未能将计数器和打印的开始结合起来。 如何告诉python记住计数器并从那里开始应该做的事情？

我的剧本

import re

# Script to extract show running output from a raw/unformatted(not easily readable) log file
file = input("Please enter the path to the config file: ")


def myCounter():
    counter = 0
    with open(file, "r") as f0:
        for line in f0:
            if line.strip() == "show run": # count number of times the delimeter "sh run" appears in the file
                counter+=1
    return(counter) # return those number of times, which is at the same time supposedly the starting position of printing

def extract():
    position = myCounter() # tried to mark the starting point
    with open(file, "r") as f1:
        for line in f1:
            if line.strip() == "show run": # I commented "and position" from "myCounter()" out because it did not work. 
                                           # I wanted the script to remember the last position of "sh run" and start from there
                break

        for line in f1:
            pattern2 = r"\s{3}end"
            match2 = re.findall(pattern2, line)
            if match2: 
                break
            print(line.lstrip(), end="")
        print("end\n")

if __name__=="__main__":
    extract()

示例日志文件（标识是故意的）：

a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
            Building configuration...#1
            
            Current configuration : 5154 bytes
            !
            ! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
            ! a lot of other configuration data
            !
            end
            
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
            Building configuration...#2
            
            Current configuration : 5154 bytes
            !
            ! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
            ! a lot of other configuration data
            !
            end
            
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
            Building configuration...#3
            
            Current configuration : 5154 bytes
            !
            ! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
            ! a lot of other configuration data
            !
            end
            
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run                             <---------------------------I want to skip all other 1-3 "show run"...."end" blocks and extract this last one!
            Building configuration...#4
            
            Current configuration : 5154 bytes
            !
            ! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
            ! a lot of other configuration data
            !
            end
            
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data

Answer 1

那应该这样做：

with open('test.txt') as f:
    s = f.read()
    start = s.rfind("show run") + len("show run")
    end = s.rfind("end")
    substring = s[start:end]
    print(substring)

Answer 2

不确定在这里使用正则表达式是否合理。 考虑到您的示例日志作为文本存储在变量log中，我的方法如下：

# Define boundaries where important log lines start/end
start = "show run"
end = "end"
# Which result you want to print
begin = 3

# Store all findings
findings = []
# Indicate whether to store lines or not
append = False
# List to keep each individual finding
finding = []

# Iterate over each line
for line in log.splitlines():
    # Set append to True if important lines start and skip processing this line
    if line.startswith(start):
        append = True
        continue
    # Set append to True, store the current finding as a strin and skip processing of this line
    elif line.endswith(end):
        append = False
        findings.append('\n'.join(finding))
        finding = []
        continue
    
    # Append current line to your current finding if append is set
    if append:
        finding.append(line)

# Print finding at index `begin` (3)
for count, finding in enumerate(findings):
    if count == begin:
        print(finding)
        print("-------")

Answer 3

编写一个生成器来生成文件中的所有文本块，并编写另一个生成器迭代该生成器的结果，直到它无法再获取，返回最后一个值：

def text_blocks(file, start, end):
    "Yield text blocks in `file` delimited by `start` and `end`"
    r = []
    in_block = False
    for line in file:
        if line.strip() == start:
            in_block = True
        elif in_block:
            if line.strip() == end:
                yield r
                r = []
            r.append(line)

def last(it):
    "Return the last item in an iterator"
    for item in it:
        pass
    return item

with open(file) as f:
    print(last(text_blocks(f, "show run", "end")))

或者，如果您特别想获取第 n 个文本块，您可以编写一个解析函数，该函数也可以像这样计算块：

def nth_text_block(file, start, end, n):
    r = []
    in_block = False
    count = 0
    for line in file:
        if line.strip() == start:
            in_block = True
        elif in_block and count == n:
            if line.strip() == end:
                return r
            else:
                r.append(line)
        elif in_block and line.strip() == end:
            count += 1
            in_block = False
    raise EOFError

with open(file) as f:
    print(nth_text_block(f, "show run", "end", 3))

如何在python中提取两个分隔符之间的文本？

问题描述

3 个解决方案

解决方案1
1 2022-06-01 16:06:29

解决方案2
0 2022-06-01 15:47:50

解决方案3
0 2022-06-01 15:48:52

如何在python中提取两个分隔符之间的文本？

问题描述

3 个解决方案

解决方案1 1 2022-06-01 16:06:29

解决方案2 0 2022-06-01 15:47:50

解决方案3 0 2022-06-01 15:48:52

解决方案1
1 2022-06-01 16:06:29

解决方案2
0 2022-06-01 15:47:50

解决方案3
0 2022-06-01 15:48:52