简体   繁体   中英

Why isn't this group capturing all items that appear in parentheses?

I'm trying to create a regex that will capture a string not enclosed by parentheses in the first group, followed by any amount of strings enclosed by parentheses.

eg

2(3)(4)(5)

Should be: 2 - first group, 3 - second group, and so on.

What I came up with is this regex: (I'm using JavaScript)

([^()]*)(?:\((([^)]*))\))*

However, when I enter a string like A(B)(C)(D), I only get the A and D captured.

https://regex101.com/r/HQC0ib/1

Can anyone help me out on this, and possibly explain where the error is?

You cannot have an undetermined number of capture groups. The number of capture groups you get is determined by the regular expression, not by the input it parses. A capture group that occurs within another repetition will indeed only retain the last of those repetitions.

If you know the maximum number of repetitions you can encounter, then just repeat the pattern that many times, and make each of them optional with a ? . For instance, this will capture up to 4 items within parentheses:

([^()]*)(?:\(([^)]*)\))?(?:\(([^)]*)\))?(?:\(([^)]*)\))?(?:\(([^)]*)\))?

It's not an error. It's just that in regex when you repeat a capture group (...)* that only the last occurence will be put in the backreference.

For example:

On a string "a,b,c,d", if you match /(,[az])+/ then the back reference of capture group 1 ( \\1 ) will give ",d".

If you want it to return more, then you could surround it in another capture group.
--> With /((?:,[az])+)/ then \\1 will give ",b,c,d".

To get those numbers between the parentheses you could also just try to match the word characters.

For example:

 var str = "2(3)(14)(B)"; var matches = str.match(/\\w+/g); console.log(matches); 

Since you cannot use a \\G anchor in JS regex (to match consecutive matches), and there is no stack for each capturing group as in a .NET / PyPi regex libraries, you need to use a 2 step approach: 1) match the strings as whole streaks of text, and then 2) post-process to get the values required.

 var s = "2(3)(4)(5) A(B)(C)(D)"; var rx = /[^()\\s]+(?:\\([^)]*\\))*/g; var res = [], m; while(m=rx.exec(s)) { res.push(m[0].split(/[()]+/).filter(Boolean)); } console.log(res); 

I added \\s to the negated character class [^()] since I added the examples as a single string.

Pattern details

  • [^()\\s]+ - 1 or more chars other than ( , ) and whitespace
  • (?:\\([^)]*\\))* - 0 or more sequences of:
    • \\( - a (
    • [^)]* - 0+ chars other than )
    • \\) - a )

The splitting regex is [()]+ that matches 1 or more ) or ( chars, and filter(Boolean) removes empty items.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM