[英]How to extract text between two delimeters in python?
首先,我是 python 新手,我搜索了类似的问题,但 xould 不是我想要的。 所以我请原谅:-)
所以,我的问题是:
有一个日志文件,其中包含几个“显示运行”输出以及它们之间的一些其他不相关的详细信息。 我只想提取“显示运行”和“结束”分隔符之间的显示运行细节
我已经设法得到这样一个块但不幸的是我只能得到第一个这样的块(工作)但这不是我想要的。 我想要的是提取我的分隔符“显示运行”和“结束”(失败)之间的最后一个块。
我的工作脚本(工作)如下
(faiing 部分)我想包括一个计数器来计算找到“显示运行”的次数。 假设计数器 = 3。所以我想要的打印应该从第 3 次“显示运行”开始。
到目前为止,我未能将计数器和打印的开始结合起来。 如何告诉python记住计数器并从那里开始应该做的事情?
我的剧本
import re
# Script to extract show running output from a raw/unformatted(not easily readable) log file
file = input("Please enter the path to the config file: ")
def myCounter():
counter = 0
with open(file, "r") as f0:
for line in f0:
if line.strip() == "show run": # count number of times the delimeter "sh run" appears in the file
counter+=1
return(counter) # return those number of times, which is at the same time supposedly the starting position of printing
def extract():
position = myCounter() # tried to mark the starting point
with open(file, "r") as f1:
for line in f1:
if line.strip() == "show run": # I commented "and position" from "myCounter()" out because it did not work.
# I wanted the script to remember the last position of "sh run" and start from there
break
for line in f1:
pattern2 = r"\s{3}end"
match2 = re.findall(pattern2, line)
if match2:
break
print(line.lstrip(), end="")
print("end\n")
if __name__=="__main__":
extract()
示例日志文件(标识是故意的):
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
Building configuration...#1
Current configuration : 5154 bytes
!
! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
! a lot of other configuration data
!
end
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
Building configuration...#2
Current configuration : 5154 bytes
!
! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
! a lot of other configuration data
!
end
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run
Building configuration...#3
Current configuration : 5154 bytes
!
! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
! a lot of other configuration data
!
end
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
show run <---------------------------I want to skip all other 1-3 "show run"...."end" blocks and extract this last one!
Building configuration...#4
Current configuration : 5154 bytes
!
! Last configuration change at 10:48:50 UTC Mon Dec 16 2019
! a lot of other configuration data
!
end
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
a lot irrelevant data
那应该这样做:
with open('test.txt') as f:
s = f.read()
start = s.rfind("show run") + len("show run")
end = s.rfind("end")
substring = s[start:end]
print(substring)
不确定在这里使用正则表达式是否合理。 考虑到您的示例日志作为文本存储在变量log
中,我的方法如下:
# Define boundaries where important log lines start/end
start = "show run"
end = "end"
# Which result you want to print
begin = 3
# Store all findings
findings = []
# Indicate whether to store lines or not
append = False
# List to keep each individual finding
finding = []
# Iterate over each line
for line in log.splitlines():
# Set append to True if important lines start and skip processing this line
if line.startswith(start):
append = True
continue
# Set append to True, store the current finding as a strin and skip processing of this line
elif line.endswith(end):
append = False
findings.append('\n'.join(finding))
finding = []
continue
# Append current line to your current finding if append is set
if append:
finding.append(line)
# Print finding at index `begin` (3)
for count, finding in enumerate(findings):
if count == begin:
print(finding)
print("-------")
编写一个生成器来生成文件中的所有文本块,并编写另一个生成器迭代该生成器的结果,直到它无法再获取,返回最后一个值:
def text_blocks(file, start, end):
"Yield text blocks in `file` delimited by `start` and `end`"
r = []
in_block = False
for line in file:
if line.strip() == start:
in_block = True
elif in_block:
if line.strip() == end:
yield r
r = []
r.append(line)
def last(it):
"Return the last item in an iterator"
for item in it:
pass
return item
with open(file) as f:
print(last(text_blocks(f, "show run", "end")))
或者,如果您特别想获取第 n 个文本块,您可以编写一个解析函数,该函数也可以像这样计算块:
def nth_text_block(file, start, end, n):
r = []
in_block = False
count = 0
for line in file:
if line.strip() == start:
in_block = True
elif in_block and count == n:
if line.strip() == end:
return r
else:
r.append(line)
elif in_block and line.strip() == end:
count += 1
in_block = False
raise EOFError
with open(file) as f:
print(nth_text_block(f, "show run", "end", 3))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.