使用正则表达式以任意顺序捕获多个命名组

Question

I have some regex in named groups such as (P?<a>A) , (P?B) , (P?<c>C) .我在命名组中有一些正则表达式，例如(P?<a>A) 、 (P?B) 、 (P?<c>C) 。 Then I have a sentence like some_word A C B with random order for A , B and C .然后我有一个像some_word A C B这样的句子， A 、 B和C的顺序是随机的。 I need to match those groups only if some_word appear in front of them.只有当some_word出现在他们面前时，我才需要匹配这些组。 If this is the case, I would like to have an output like this: {a: "A", b: "B", c: "C"} .如果是这种情况，我想要一个像这样的 output： {a: "A", b: "B", c: "C"} 。

I tried with the regex some_word ((?P<a>A)\s|(?PB)\s|(?P<c>C)\s){3} , but it does not work, as the group names have to be unique.我尝试使用正则表达式some_word ((?P<a>A)\s|(?PB)\s|(?P<c>C)\s){3} ，但它不起作用，因为组名必须是唯一的。

The only solution I have found is by using the regex some_word (?P<a>A|B|C)\s(?PA|B|C)\s(?P<c>A|B|C) .我找到的唯一解决方案是使用正则表达式some_word (?P<a>A|B|C)\s(?PA|B|C)\s(?P<c>A|B|C) 。 It handles the permutation between A , B and C , but I lose the link {a: "A", b: "B", c: "C"} .它处理A 、 B和C之间的排列，但我丢失了链接{a: "A", b: "B", c: "C"} 。

Thank you for your help !感谢您的帮助！

Answer 1

You can use this pattern: (?<=some_word)(?=.*(?P<a>A).*)(?=.*(?PB).*).*(?P<c>C).*您可以使用以下模式： (?<=some_word)(?=.*(?P<a>A).*)(?=.*(?PB).*).*(?P<c>C).*

See Regex Demo请参阅正则表达式演示

Code:代码：

import re

pattern = "(?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*"
text = "some_word A C B"
matches = re.search(pattern, text)
print(matches.groupdict())

Output: Output：

{'a': 'A', 'b': 'B', 'c': 'C'}

Answer 2

You can use the second approach but restrict each group pattern with a negative lookahead to avoid matching repeated contents:您可以使用第二种方法，但使用否定前瞻来限制每个组模式，以避免匹配重复的内容：

import re
text = 'some_word B C A'
for x in re.finditer(r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a))(?P<b>A|B|C)\s+(?!(?P=a)|(?P=b))(?P<c>A|B|C))', text):
    print( x.group("a") )
    print( x.group("b") )
    print( x.group("c") )

See the Python demo , output:见Python 演示，output：

B
C
A

See the regex demo .请参阅正则表达式演示。 The (?:(?P<a>A|B|C)\s+(??(?P=a))(?PA|B|C)\s+(??(?P=a)|(?P=b))(?P<c>A|B|C)) part matches A or B or C into Group "a", (?PA|B|C) matches the same and captures into Group "b", but this value cannot start the same as the value in Group "a". (?:(?P<a>A|B|C)\s+(??(?P=a))(?PA|B|C)\s+(??(?P=a)|(?P=b))(?P<c>A|B|C))部分将A或B或C匹配到“a”组中， (?PA|B|C)匹配相同并捕获到“b”组，但此值不能与“a”组中的值相同。

To make sure the values are not equal, you can add the whitespace boundaries to the lookaheads:为确保值不相等，您可以将空白边界添加到先行：

r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a)(?!\S))(?P<b>A|B|C)\s+(?!(?:(?P=a)|(?P=b))(?!\S))(?P<c>A|B|C))'

Answer 3

If you are looking to match from some_word up until the last A,B, or C如果您要匹配从 some_word 到最后一个 A、B 或 C
in random order something like this works.以随机顺序这样的东西有效。
This will match the minimum string after some_word up until the first set这将匹配 some_word 之后的最小字符串，直到第一组
that includes A, B or C at least once.至少包含一次 A、B 或 C。

some_word(?:(?=(?P<a>A)()|(?P<b>B)()|(?P<c>C)()|.).)+?(?=\2\4\6)

https://regex101.com/r/Gu5TnB/1 https://regex101.com/r/Gu5TnB/1

使用正则表达式以任意顺序捕获多个命名组

问题描述

3 个解决方案

解决方案1
1 2021-08-25 12:55:01

解决方案2
1 2021-08-25 13:25:18

解决方案3
0 2021-08-26 00:00:55

使用正则表达式以任意顺序捕获多个命名组

问题描述

3 个解决方案

解决方案1 1 2021-08-25 12:55:01

解决方案2 1 2021-08-25 13:25:18

解决方案3 0 2021-08-26 00:00:55

解决方案1
1 2021-08-25 12:55:01

解决方案2
1 2021-08-25 13:25:18

解决方案3
0 2021-08-26 00:00:55