简体   繁体   English

使用正则表达式以任意顺序捕获多个命名组

[英]Capture multiple named groups in any order with regex

I have some regex in named groups such as (P?<a>A) , (P?<b>B) , (P?<c>C) .我在命名组中有一些正则表达式,例如(P?<a>A)(P?<b>B)(P?<c>C) Then I have a sentence like some_word A C B with random order for A , B and C .然后我有一个像some_word A C B这样的句子, ABC的顺序是随机的。 I need to match those groups only if some_word appear in front of them.只有当some_word出现在他们面前时,我才需要匹配这些组。 If this is the case, I would like to have an output like this: {a: "A", b: "B", c: "C"} .如果是这种情况,我想要一个像这样的 output: {a: "A", b: "B", c: "C"}

I tried with the regex some_word ((?P<a>A)\s|(?P<b>B)\s|(?P<c>C)\s){3} , but it does not work, as the group names have to be unique.我尝试使用正则表达式some_word ((?P<a>A)\s|(?P<b>B)\s|(?P<c>C)\s){3} ,但它不起作用,因为组名必须是唯一的。

The only solution I have found is by using the regex some_word (?P<a>A|B|C)\s(?P<b>A|B|C)\s(?P<c>A|B|C) .我找到的唯一解决方案是使用正则表达式some_word (?P<a>A|B|C)\s(?P<b>A|B|C)\s(?P<c>A|B|C) It handles the permutation between A , B and C , but I lose the link {a: "A", b: "B", c: "C"} .它处理ABC之间的排列,但我丢失了链接{a: "A", b: "B", c: "C"}

Thank you for your help !感谢您的帮助 !

You can use this pattern: (?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*您可以使用以下模式: (?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*

See Regex Demo请参阅正则表达式演示

Code:代码:

import re

pattern = "(?<=some_word)(?=.*(?P<a>A).*)(?=.*(?P<b>B).*).*(?P<c>C).*"
text = "some_word A C B"
matches = re.search(pattern, text)
print(matches.groupdict())                         

Output: Output:

{'a': 'A', 'b': 'B', 'c': 'C'}

You can use the second approach but restrict each group pattern with a negative lookahead to avoid matching repeated contents:您可以使用第二种方法,但使用否定前瞻来限制每个组模式,以避免匹配重复的内容:

import re
text = 'some_word B C A'
for x in re.finditer(r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a))(?P<b>A|B|C)\s+(?!(?P=a)|(?P=b))(?P<c>A|B|C))', text):
    print( x.group("a") )
    print( x.group("b") )
    print( x.group("c") )

See the Python demo , output:Python 演示,output:

B
C
A

See the regex demo .请参阅正则表达式演示 The (?:(?P<a>A|B|C)\s+(??(?P=a))(?P<b>A|B|C)\s+(??(?P=a)|(?P=b))(?P<c>A|B|C)) part matches A or B or C into Group "a", (?P<b>A|B|C) matches the same and captures into Group "b", but this value cannot start the same as the value in Group "a". (?:(?P<a>A|B|C)\s+(??(?P=a))(?P<b>A|B|C)\s+(??(?P=a)|(?P=b))(?P<c>A|B|C))部分将ABC匹配到“a”组中, (?P<b>A|B|C)匹配相同并捕获到“b”组,但此值不能与“a”组中的值相同。

To make sure the values are not equal, you can add the whitespace boundaries to the lookaheads:为确保值不相等,您可以将空白边界添加到先行:

r'some_word\s+(?:(?P<a>A|B|C)\s+(?!(?P=a)(?!\S))(?P<b>A|B|C)\s+(?!(?:(?P=a)|(?P=b))(?!\S))(?P<c>A|B|C))'

If you are looking to match from some_word up until the last A,B, or C如果您要匹配从 some_word 到最后一个 A、B 或 C
in random order something like this works.以随机顺序这样的东西有效。
This will match the minimum string after some_word up until the first set这将匹配 some_word 之后的最小字符串,直到第一组
that includes A, B or C at least once.至少包含一次 A、B 或 C。

some_word(?:(?=(?P<a>A)()|(?P<b>B)()|(?P<c>C)()|.).)+?(?=\2\4\6)

https://regex101.com/r/Gu5TnB/1 https://regex101.com/r/Gu5TnB/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM