[英]Backtracking issue in LookAhead regular expression
I am trying to match the following text using this regular expression: ABC: ((?:.+\n?)+|.+)(?=DE:)
The text I have as sample is:我正在尝试使用此正则表达式匹配以下文本:
ABC: ((?:.+\n?)+|.+)(?=DE:)
我作为示例的文本是:
ABC: Lorem ipsum dolor
sit amet. Lorem ipsum dolor DE: ** Lorem
Other Text1: 1Lorem ipsum dolor sit amet
Other Text2: 2Lorem ipsum dolor sit amet
Other Text3: 3Lorem ipsum dolor sit amet
Other Text4: 4Lorem ipsum dolor sit amet
But I have an issue with the number of iteration in the backtracking causing it to be stuck for ever.但是我对回溯中的迭代次数有疑问,导致它永远被卡住。
I share the full code If you want to test it:我分享完整的代码如果你想测试它:
import re
text = """ABC: Lorem ipsum dolor
sit amet. Lorem ipsum dolor DE: Lorem
Other Text1: 1Lorem ipsum dolor sit amet
Other Text2: 2Lorem ipsum dolor sit amet
Other Text3: 3Lorem ipsum dolor sit amet
Other Text4: 4Lorem ipsum dolor sit amet
"""
aux = re.search(r"ABC: ((?:.+\n?)+(?=DE:)|.+)",text,re.M|re.U)
if aux:
print(aux.group(1))
else:
print("Could not be found")
Maybe you could try:也许你可以尝试:
aux = re.findall(r'\bABC:\s*(.+?)\s*\bDE:', text, re.DOTALL)[0]
Or:或者:
aux = re.findall(r'\bABC:\s*([\w\W]+?)\s*\bDE:', text)[0]
Both print:两者都打印:
Lorem ipsum dolor
sit amet. Lorem ipsum dolor
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.