简体   繁体   English

正则表达式(python)仅在特定模式之前或之后匹配同一组多次

[英]Regex (python) to match same group several times only when preceded or followed by specific pattern

Suppose I have the following text:假设我有以下文本:

Products to be destroyed: «Prabo», «Palox 2000», «Remadon strong» (Rule). The customers «Dilora» and «Apple» has to be notified.

I need to match every string within the «» quotes but ONLY in the period starting with the "Products to be destroyed:" pattern or ending with the (Rule) pattern.我需要匹配«»引号内的每个字符串,但仅在以“要销毁的产品:”模式开头或以(规则)模式结尾的时段内。

In other words in this example I do NOT want to match Dilora nor Apple.换句话说,在这个例子中,我不想匹配 Dilora 或 Apple。

The regex to get the quoted contents in the capturing group is:获取捕获组中引用的内容的正则表达式是:

«(.+?)»

Is it possible to "anchor" it to either a following pattern (such as Rule) or even to a prior pattern (such as "Products to be destroyed:"?是否可以将其“锚定”到以下模式(例如规则)甚至先前的模式(例如“要销毁的产品:”?

This is my saved attempt on regex101这是我在 regex101 上保存的尝试

Thank you very much.非常感谢。

You can match at least a single part between the arrows, and when there is a match, extract all the parts using re.findall for example.您可以在箭头之间至少匹配一个部分,当匹配时,使用 re.findall 提取所有部分。

The example data seems to be within a dot.示例数据似乎在一个点内。 In that case you can match at least a single arrow part matching any char except a dot using a negated character class.在这种情况下,您可以使用否定字符 class 匹配至少一个匹配除点之外的任何字符的单个箭头部分。

Regex demo for at least a single match, and another demo to match the separate parts afterwards至少一个匹配的正则表达式演示,以及随后匹配单独部分的另一个演示

import re

regex = r"\bProducts to be destroyed:[^.]*«[^«»]*»[^.]*\."
s = 'Products to be destroyed: «Prabo», «Palox 2000», «Remadon strong» (Rule). The customers «Dilora» and «Apple» has to be notified.'
result = re.search(regex, s)

if result:
    print(re.findall(r"«([^«»]*)»", result.group()))

Output Output

['Prabo', 'Palox 2000', 'Remadon strong']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM