简体   繁体   中英

Regex to find named capturing groups with Go programming language

I'm looking for a regex to find named capturing groups in (other) regex strings.

Example: I want to find (?P<country>m((a|b).+)n) , (?P<city>.+) and (?P<street>(5|6)\\. .+) in the following regex:

/(?P<country>m((a|b).+)n)/(?P<city>.+)/(?P<street>(5|6)\. .+)

I tried the following regex to find the named capturing groups:

var subGroups string = `(\(.+\))*?`
var prefixedSubGroups string = `.+` + subGroups
var postfixedSubGroups string = subGroups + `.+`
var surroundedSubGroups string = `.+` + subGroups + `.+`
var capturingGroupNameRegex *regexp.RichRegexp = regexp.MustCompile(
    `(?U)` + 
    `\(\?P<.+>` + 
    `(` +   prefixedSubGroups + `|` + postfixedSubGroups + `|` + surroundedSubGroups + `)` + 

?U makes greedy quantifiers( + and * ) non-greedy, and non-greedy quantifiers ( *? ) greedy. Details in the Go regex documentation .

But it doesn't work because parenthesis are not matched correctly.

Matching arbitrarily nested parentheses correctly is not possible with regular expressions because arbitrary (recursive) nesting cannot be described by a regular language.

Some modern regex flavor do support recursion (Perl, PCRE) or balanced matching (.NET), but Go is not one of them ( the docs explicitly say that Perl's (?R) construct is not supported by the RE2 library that Go's regex package appears to be based on ). You need to build a recursive descent parser, not a regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM