简体   繁体   中英

Unexpected behavior around recursive regex

I am trying to match C++ argument type which can contain balanced < and > characters.

With this regex: (\\<(?>[^<>]|(?R))*\\>)

On this string: QMap<QgsFeatureId, QPair<QMap<Something, Complex> >>

It matches all expect the first 4 characters (QMap).

Now, if I add \\w+ at the start of my regex, it now only matches the end of it ( QPair<QMap<Something, Complex> >> ) and not the whole string.

What is the explanation and how to solve this?

You can try it online here .

This is intented to use in Perl 5.10+ (5.24).

The (?R) construct recurses the entire pattern. When you add \\w+ at the start, it is also accounted for when the recursion takes place. However, what you want to recurse is the Group 1 subpattern.

You need a subroutine call that will recurse the capturing group subpattern:

(\w+)(<(?:[^<>]++|(?2))*>)

See the regex demo

Details

  • (\\w+) - Group 1 capturing the identifier (you may change it to [a-zA-Z]\\w* )
  • (<(?:[^<>]++|(?2))*>) - Group 2 (that will be recursed)
    • < - a literal <
    • (?:[^<>]++|(?2))* - either 1+ chars other than < and > (possessively, to make it faster) or ( | ) the whole Group 2 pattern ( (?2) ).
    • > - a literal >

Results:

Match:   QMap<QgsFeatureId, QPair<QMfap<Something, Complex> >>
Group 1: QMap
Group 2: <QgsFeatureId, QPair<QMfap<Something, Complex> >>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM