简体   繁体   English

LookAhead 正则表达式中的回溯问题

[英]Backtracking issue in LookAhead regular expression

I am trying to match the following text using this regular expression: ABC: ((?:.+\n?)+|.+)(?=DE:) The text I have as sample is:我正在尝试使用此正则表达式匹配以下文本: ABC: ((?:.+\n?)+|.+)(?=DE:)我作为示例的文本是:

ABC: Lorem ipsum dolor
sit amet. Lorem ipsum dolor DE: ** Lorem
Other Text1: 1Lorem ipsum dolor sit amet
Other Text2: 2Lorem ipsum dolor sit amet
Other Text3: 3Lorem ipsum dolor sit amet
Other Text4: 4Lorem ipsum dolor sit amet

But I have an issue with the number of iteration in the backtracking causing it to be stuck for ever.但是我对回溯中的迭代次数有疑问,导致它永远被卡住。

I share the full code If you want to test it:我分享完整的代码如果你想测试它:

import re

text = """ABC: Lorem ipsum dolor
sit amet. Lorem ipsum dolor DE: Lorem
Other Text1: 1Lorem ipsum dolor sit amet
Other Text2: 2Lorem ipsum dolor sit amet
Other Text3: 3Lorem ipsum dolor sit amet
Other Text4: 4Lorem ipsum dolor sit amet
"""


aux = re.search(r"ABC: ((?:.+\n?)+(?=DE:)|.+)",text,re.M|re.U)
if aux:
    print(aux.group(1))
else:
    print("Could not be found")

Maybe you could try:也许你可以尝试:

aux = re.findall(r'\bABC:\s*(.+?)\s*\bDE:', text, re.DOTALL)[0]

Or:或者:

aux = re.findall(r'\bABC:\s*([\w\W]+?)\s*\bDE:', text)[0]

Both print:两者都打印:

Lorem ipsum dolor
sit amet. Lorem ipsum dolor

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM