简体   繁体   中英

python, regular expressions, named groups and “logical or” operator

In python regular expression, named and unnamed groups are both defined with '(' and ')'. This leads to a weird behavior. Regexp

"(?P<a>1)=(?P<b>2)"

used with text "1=2" will find named group "a" with value "1" and named group "b" with value "2". But if i want to use "logical or" operator and concatenate multiple rules, the following regexp:

"((?P<a>1)=(?P<b>2))|(?P<c>3)"

used with same text "1=2" will find an unnamed group with value "1=2". I understood that regexp engine treats "(" and ")" that encloses groups "a" and "b" as an unnamed group and reports that it is found. But i don't want an unnamed groups to be reported, i just want to use "|" in order to "glue" multiple regexps together. Without creating any parasitic unnamed groups. Is it a way to do so in python?

Use (?:) to get rid of the unnamed group:

r"(?:(?P<a>1)=(?P<b>2))|(?P<c>3)"

From the documentation of re :

(?:...) A non-grouping version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

By the way, the alternation operator | has very low precedence in order to make parentheses unnecessary in cases like yours. You can drop the extra parentheses in your regex and it will continue to work as expected:

r"(?P<a>1)=(?P<b>2)|(?P<c>3)"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM