![](/img/trans.png)
[英]Python regex extract strings between matching strings, including matching strings
[英]How to extract text between matching strings including match strings and lines
我正在使用python来提取匹配字符串之间的某些字符串。 这些字符串是从列表生成的,该列表再次由单独的python函数动态生成。 我正在处理的清单如下: -
sample_list = ['line1 this line a first line',
'line1 this line is also considered as line one...',
'line1 this line is the first line',
'line2 this line is second line to be included in output',
'line3 this should also be included in output',
'line1 this contain other strings',
'line1 this may contain other strings as well',
'line2 this line is second line to be included in output',
'line3 this should also be included in output',
'line1 what the heck is it...'
]
我想要的输出类似于: -
line1 this line is the first line
line2 this line is second line to be included in output
line3 this should also be included in output
line1 this may contain other strings as well
line2 this line is second line to be included in output
line3 this should also be included in output
如您所见,我想提取以line1开头并以line3结尾的文本/行(直到行结尾) 。 最终输出包括匹配的单词(即line1和line3)。
我试过的代码是: -
# Convert list to string first
list_to_str = '\n'.join(sample_list)
# Get desired output
print(re.findall('\nline1(.*?)\nline2(.*?)\nline3($)', list_to_str, re.DOTALL))
这是我作为输出()得到的: -
[]
任何帮助表示赞赏。
编辑1: -我做了一些工作,找到了最近的解决方案: -
matches = (re.findall(r"^line1(.*)\nline2(.*)\nline3(.*)$", list_to_str, re.MULTILINE))
for match in matches:
print('\n'.join(match))
它给了我这个输出: -
this line is the first line
this line is second line to be included in output
this is the third and it should also be included in output
this may contain other strings as well
this line is second line to be included in output...
this is the third should also be included in output
输出几乎正确,但不包括匹配文本。
如果你正在寻找1,2和3行的序列,没有重复
就是这个
line1.*\\s*(?!\\s|line[13])line2.*\\s*(?!\\s|line[12])line3.*
解释
line1 .* \s* # line 1 plus newline(s)
(?! \s | line [13] ) # Next cannot be line 1 or 3 (or whitespace)
line2 .* \s* # line 2 plus newline(s)
(?! \s | line [12] ) # Next cannot be line 1 or 2 (or whitespace)
line3 .* # line 3
如果要捕获行内容,只需将捕获组放在(.*)
旁边
这可能不是最清晰的方式(您可能想要使用正则表达式),但输出您想要的内容:
sample_list = ['line1 this line a first line',
'line1 this line is also considered as line one...',
'line1 this line is the first line',
'line2 this line is second line to be included in output',
'line3 this should also be included in output',
'line1 this contain other strings',
'line1 this may contain other strings as well',
'line2 this line is second line to be included in output',
'line3 this should also be included in output',
'line1 what the heck is it...'
]
output = []
text = str
line1 = ""
line2 = ""
line3 = ""
prevStart = ""
for text in sample_list:
if prevStart == "":
if text.startswith("line1"):
prevStart = "line1"
line1 = text
elif prevStart == "line1":
if text.startswith("line2"):
prevStart ="line2"
line2 = text
elif text.startswith("line1"):
line1 = text
prevStart = "line1"
else:
prevStart = ""
elif prevStart == "line2":
if text.startswith("line3"):
prevStart = ""
line3 = text
else:
prevStart = ""
if line1 != "" and line2 != "" and line3 != "":
output.append(line1)
output.append(line2)
output.append(line3)
line1 = ""
line2 = ""
line3 = ""
for line in output:
print line
此代码的输出是:
line1 this line is the first line
line2 this line is second line to be included in output
line3 this should also be included in output
line1 this may contain other strings as well
line2 this line is second line to be included in output
line3 this should also be included in output
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.