[英]Capture multiple named groups in any order with regex
I have some regex in named groups such as (P?<a>A)
, (P?<b>B)
, (P?<c>C)
.我在命名组中有一些正则表达式,例如
(P?<a>A)
、 (P?<b>B)
、 (P?<c>C)
。 Then I have a sentence like some_word A C B
with random order for A
, B
and C
.然后我有一个像
some_word A C B
这样的句子, A
、 B
和C
的顺序是随机的。 I need to match those groups only if some_word
appear in front of them.只有当
some_word
出现在他们面前时,我才需要匹配这些组。 If this is the case, I would like to have an output like this: {a: "A", b: "B", c: "C"}
.如果是这种情况,我想要一个像这样的 output:
{a: "A", b: "B", c: "C"}
。
I tried with the regex some_word ((?P<a>A)\s|(?P<b>B)\s|(?P<c>C)\s){3}
, but it does not work, as the group names have to be unique.我尝试使用正则表达式
some_word ((?P<a>A)\s|(?P<b>B)\s|(?P<c>C)\s){3}
,但它不起作用,因为组名必须是唯一的。
The only solution I have found is by using the regex some_word (?P<a>A|B|C)\s(?P<b>A|B|C)\s(?P<c>A|B|C)
.我找到的唯一解决方案是使用正则表达式
some_word (?P<a>A|B|C)\s(?P<b>A|B|C)\s(?P<c>A|B|C)
。 It handles the permutation between A
, B
and C
, but I lose the link {a: "A", b: "B", c: "C"}
.它处理
A
、 B
和C
之间的排列,但我丢失了链接{a: "A", b: "B", c: "C"}
。
Thank you for your help !感谢您的帮助 !
You can use this pattern: (?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*
您可以使用以下模式:
(?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*
Code:代码:
import re
pattern = "(?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*"
text = "some_word A C B"
matches = re.search(pattern, text)
print(matches.groupdict())
Output: Output:
{'a': 'A', 'b': 'B', 'c': 'C'}
You can use the second approach but restrict each group pattern with a negative lookahead to avoid matching repeated contents:您可以使用第二种方法,但使用否定前瞻来限制每个组模式,以避免匹配重复的内容:
import re
text = 'some_word B C A'
for x in re.finditer(r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a))(?P<b>A|B|C)\s+(?!(?P=a)|(?P=b))(?P<c>A|B|C))', text):
print( x.group("a") )
print( x.group("b") )
print( x.group("c") )
See the Python demo , output:见Python 演示,output:
B
C
A
See the regex demo .请参阅正则表达式演示。 The
(?:(?P<a>A|B|C)\s+(??(?P=a))(?P<b>A|B|C)\s+(??(?P=a)|(?P=b))(?P<c>A|B|C))
part matches A
or B
or C
into Group "a", (?P<b>A|B|C)
matches the same and captures into Group "b", but this value cannot start the same as the value in Group "a". (?:(?P<a>A|B|C)\s+(??(?P=a))(?P<b>A|B|C)\s+(??(?P=a)|(?P=b))(?P<c>A|B|C))
部分将A
或B
或C
匹配到“a”组中, (?P<b>A|B|C)
匹配相同并捕获到“b”组,但此值不能与“a”组中的值相同。
To make sure the values are not equal, you can add the whitespace boundaries to the lookaheads:为确保值不相等,您可以将空白边界添加到先行:
r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a)(?!\S))(?P<b>A|B|C)\s+(?!(?:(?P=a)|(?P=b))(?!\S))(?P<c>A|B|C))'
If you are looking to match from some_word up until the last A,B, or C如果您要匹配从 some_word 到最后一个 A、B 或 C
in random order something like this works.以随机顺序这样的东西有效。
This will match the minimum string after some_word up until the first set这将匹配 some_word 之后的最小字符串,直到第一组
that includes A, B or C at least once.至少包含一次 A、B 或 C。
some_word(?:(?=(?P<a>A)()|(?P<b>B)()|(?P<c>C)()|.).)+?(?=\2\4\6)
https://regex101.com/r/Gu5TnB/1 https://regex101.com/r/Gu5TnB/1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.