简体   繁体   English

当命名捕获组不能使用两次时,如何在非捕获组之前或之后使用命名正则表达式捕获?

[英]how to use a named regex capture before OR after a non capturing group when named capture groups cannot be used twice?

I'm using regex in a python script to capture a named group.我在 python 脚本中使用正则表达式来捕获命名组。 The group occurs before OR after a delimiter string " S ".该组出现在分隔符字符串“ S ”之前或之后。 My confusion comes from an inability to use named capturing groups twice in the same regex.我的困惑来自无法在同一个正则表达式中两次使用命名捕获组。

I'd like to use the following invalid (named group used twice) regex:我想使用以下无效(命名组使用两次)正则表达式:

(?:^STD_S_)(?P<important>.+?)$|(?:^(?P<important>.+?)(?:_S_STD)$

Description:描述:

?: non-capture group ^STD_S_ Starting with some "STD_S_" string which is a standard string plus a delimiter

?P Named important string I want

| OR

^?P stat with important _S_STD$ end with standard

I would really like the important group I capture to be named.我真的很想命名我捕获的重要组。 I can remove the names and get this to work.我可以删除名称并使其正常工作。 I can also split the single expression into two expressions (one from each side of the OR) and search choose which one to use with some login in the python script.我还可以将单个表达式拆分为两个表达式(一个来自 OR 的每一侧),然后在 python 脚本中搜索选择哪个表达式用于登录。

Thanks!谢谢!

EXAMPLE INPUTS示例输入

STD_S_important
important_S_STD

EXAMPLE OUTPUTS示例输出

important #returned by calling the important named group
important

regex based on comments that doesn't match the second case.基于与第二种情况不匹配的注释的正则表达式。

(?:(?:^STD_S_)(?P<important>.+?)$)|(?:^(?P=important)(?:_S_STD)$)

Note the general form of the regex is: A(?P<name>B)|(?P<name>B)C .请注意,正则表达式的一般形式是: A(?P<name>B)|(?P<name>B)C Since a name can't be repeated for named groups, it must go around the whole expression.由于命名组的名称不能重复,因此必须在整个表达式周围使用 go。 This causes another issue: it captures the prefix and suffix in the named group.这会导致另一个问题:它捕获命名组中的前缀和后缀。 To resolve this, you can use lookarounds to prevent the prefix and suffix from being captured within the group.要解决此问题,您可以使用环视来防止前缀和后缀在组中被捕获。

(?P<name>(?<=A)B|B(?=C))

Note that this only works when the prefix is of fixed length.请注意,这仅在前缀具有固定长度时才有效。 If part of the prefix or suffix themselves should be captured, you can add capturing groups to the lookarounds.如果应捕获部分前缀或后缀本身,您可以将捕获组添加到环视。 Anchors cannot be placed next to the lookarounds but must instead be put in them, else they will create mutually exclusive requirements.锚点不能放在环视旁边,而必须放在环视周围,否则它们会产生互斥的要求。

# can succeed:
(?P<name>(?<=^A)B$|^B(?=C$))

# always fails:
(?P<name>^(?<=^A)B$|^B(?=C)$)
^(?P<name>(?<=^A)B|B(?=C))$

For the regex in question, this gives:对于有问题的正则表达式,这给出:

(?P<important>(?<=^STD_S_).+$|^.+(?=_S_STD$))

( RegEx101 demo ) RegEx101 演示

Alternatively, the regex module allows the same group name to be used for multiple groups, with the last capture taking precedence.或者,正则表达式模块允许将相同的组名用于多个组,最后一次捕获优先。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM