繁体   English   中英

如何在python中提取两个分隔符之间的文本?

[英]How to extract text between two delimeters in python?

首先,我是 python 新手,我搜索了类似的问题,但 xould 不是我想要的。 所以我请原谅:-)

所以,我的问题是:

有一个日志文件,其中包含几个“显示运行”输出以及它们之间的一些其他不相关的详细信息。 我只想提取“显示运行”和“结束”分隔符之间的显示运行细节

我已经设法得到这样一个块但不幸的是我只能得到第一个这样的块(工作)但这不是我想要的。 我想要的是提取我的分隔符“显示运行”和“结束”(失败)之间的最后一个块。

我的工作脚本(工作)如下

  • 打开日志文件,逐行读取
  • 当遇到“show run”时,break
  • 搜索最后一个闭合分隔符“end”的匹配模式
  • 只要没有遇到“结束”,就打印该行。 这样,脚本将打印“show run”之后的行,直到找到“end”之前的行,主要是打印 show run 输出块。
  • 然后脚本退出。

(faiing 部分)我想包括一个计数器来计算找到“显示运行”的次数。 假设计数器 = 3。所以我想要的打印应该从第 3 次“显示运行”开始。

到目前为止,我未能将计数器和打印的开始结合起来​​。 如何告诉python记住计数器并从那里开始应该做的事情?

我的剧本

import re

# Script to extract show running output from a raw/unformatted(not easily readable) log file
file = input("Please enter the path to the config file: ")


def myCounter():
    counter = 0
    with open(file, "r") as f0:
        for line in f0:
            if line.strip() == "show run": # count number of times the delimeter "sh run" appears in the file
                counter+=1
    return(counter) # return those number of times, which is at the same time supposedly the starting position of printing

def extract():
    position = myCounter() # tried to mark the starting point
    with open(file, "r") as f1:
        for line in f1:
            if line.strip() == "show run": # I commented "and position" from "myCounter()" out because it did not work. 
                                           # I wanted the script to remember the last position of "sh run" and start from there
                break

        for line in f1:
            pattern2 = r"\s{3}end"
            match2 = re.findall(pattern2, line)
            if match2: 
                break
            print(line.lstrip(), end="")
        print("end\n")

if __name__=="__main__":
    extract()

示例日志文件(标识是故意的):

a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
            Building configuration...#1
            
            Current configuration : 5154 bytes
            !
            ! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
            ! a lot of other configuration data
            !
            end
            
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
            Building configuration...#2
            
            Current configuration : 5154 bytes
            !
            ! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
            ! a lot of other configuration data
            !
            end
            
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
            Building configuration...#3
            
            Current configuration : 5154 bytes
            !
            ! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
            ! a lot of other configuration data
            !
            end
            
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run                             <---------------------------I want to skip all other 1-3 "show run"...."end" blocks and extract this last one!
            Building configuration...#4
            
            Current configuration : 5154 bytes
            !
            ! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
            ! a lot of other configuration data
            !
            end
            
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data

那应该这样做:

with open('test.txt') as f:
    s = f.read()
    start = s.rfind("show run") + len("show run")
    end = s.rfind("end")
    substring = s[start:end]
    print(substring)

不确定在这里使用正则表达式是否合理。 考虑到您的示例日志作为文本存储在变量log中,我的方法如下:

# Define boundaries where important log lines start/end
start = "show run"
end = "end"
# Which result you want to print
begin = 3

# Store all findings
findings = []
# Indicate whether to store lines or not
append = False
# List to keep each individual finding
finding = []

# Iterate over each line
for line in log.splitlines():
    # Set append to True if important lines start and skip processing this line
    if line.startswith(start):
        append = True
        continue
    # Set append to True, store the current finding as a strin and skip processing of this line
    elif line.endswith(end):
        append = False
        findings.append('\n'.join(finding))
        finding = []
        continue
    
    # Append current line to your current finding if append is set
    if append:
        finding.append(line)

# Print finding at index `begin` (3)
for count, finding in enumerate(findings):
    if count == begin:
        print(finding)
        print("-------")

编写一个生成器来生成文件中的所有文本块,并编写另一个生成器迭代该生成器的结果,直到它无法再获取,返回最后一个值:

def text_blocks(file, start, end):
    "Yield text blocks in `file` delimited by `start` and `end`"
    r = []
    in_block = False
    for line in file:
        if line.strip() == start:
            in_block = True
        elif in_block:
            if line.strip() == end:
                yield r
                r = []
            r.append(line)

def last(it):
    "Return the last item in an iterator"
    for item in it:
        pass
    return item

with open(file) as f:
    print(last(text_blocks(f, "show run", "end")))

或者,如果您特别想获取第 n 个文本块,您可以编写一个解析函数,该函数也可以像这样计算块:

def nth_text_block(file, start, end, n):
    r = []
    in_block = False
    count = 0
    for line in file:
        if line.strip() == start:
            in_block = True
        elif in_block and count == n:
            if line.strip() == end:
                return r
            else:
                r.append(line)
        elif in_block and line.strip() == end:
            count += 1
            in_block = False
    raise EOFError

with open(file) as f:
    print(nth_text_block(f, "show run", "end", 3))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM