[英]Regex (python) to match same group several times only when preceded or followed by specific pattern
Suppose I have the following text:假设我有以下文本:
Products to be destroyed: «Prabo», «Palox 2000», «Remadon strong» (Rule). The customers «Dilora» and «Apple» has to be notified.
I need to match every string within the «» quotes but ONLY in the period starting with the "Products to be destroyed:" pattern or ending with the (Rule) pattern.我需要匹配«»引号内的每个字符串,但仅在以“要销毁的产品:”模式开头或以(规则)模式结尾的时段内。
In other words in this example I do NOT want to match Dilora nor Apple.换句话说,在这个例子中,我不想匹配 Dilora 或 Apple。
The regex to get the quoted contents in the capturing group is:获取捕获组中引用的内容的正则表达式是:
«(.+?)»
Is it possible to "anchor" it to either a following pattern (such as Rule) or even to a prior pattern (such as "Products to be destroyed:"?是否可以将其“锚定”到以下模式(例如规则)甚至先前的模式(例如“要销毁的产品:”?
This is my saved attempt on regex101这是我在 regex101 上保存的尝试
Thank you very much.非常感谢。
You can match at least a single part between the arrows, and when there is a match, extract all the parts using re.findall for example.您可以在箭头之间至少匹配一个部分,当匹配时,使用 re.findall 提取所有部分。
The example data seems to be within a dot.示例数据似乎在一个点内。 In that case you can match at least a single arrow part matching any char except a dot using a negated character class.
在这种情况下,您可以使用否定字符 class 匹配至少一个匹配除点之外的任何字符的单个箭头部分。
Regex demo for at least a single match, and another demo to match the separate parts afterwards至少一个匹配的正则表达式演示,以及随后匹配单独部分的另一个演示
import re
regex = r"\bProducts to be destroyed:[^.]*«[^«»]*»[^.]*\."
s = 'Products to be destroyed: «Prabo», «Palox 2000», «Remadon strong» (Rule). The customers «Dilora» and «Apple» has to be notified.'
result = re.search(regex, s)
if result:
print(re.findall(r"«([^«»]*)»", result.group()))
Output Output
['Prabo', 'Palox 2000', 'Remadon strong']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.