简体   繁体   中英

Is there a way to match only top level parentheses with regex?

With Javascript, suppose I have a string like (1)(((2)(3))4) , can I get a regex to match just (1) and (((2)(3))4) , or do I need to do something more complicated?

Ideally the regex would return ["((2)(3))","4"] if you searched ((2)(3))4 . Actually that's really a requirement. The point is to group things into the chunks that need to be worked on first, like the way parentheses work in math.

No, there is no way to match only top level parentheses with regex

Looking only at the top level doesn't make the problem easier than general "parsing" of recursive structures. (See this relevant popular SO question with a great answer).

Here's a simple intuitive reason why Regex can't parse arbitrary levels of nesting :

To keep track of the level of nesting, one must count. If one wants to be able to keep track of an arbitrary level of nesting, one needs an arbitrarily large number while running the program.

But regular expressions are exactly those that can be implemented by DFA s, that is Deterministice finite automatons. These have only a finite number of states. Thus they can't keep track of an arbitrarily large number.

This argument works also for your specific concern of being only interested in the top level parentheses.

To recognize the top level parentheses, you must keep track of arbitrary nesting preceding any one of them:

((((..arbitrarily deep nesting...))))((.....)).......()......
^toplevel                           ^^       ^       ^^

So yes, you need something more powerful than regex.


While if you are very pragmatic, for your concrete application it might be okay to say that you won't encounter any nesting deeper than, say, 1000 (and so you might be willing to go with regex), it's also a very practical fact that any regex recognizing a nesting level of more than 2 is basically unreadable.

Well, here is one way to do it. As Jo So pointed out, you can't really do it in javascript with indefinite amounts of recursion, but you can make something arbitrarily recursive pretty easily. I'm not sure how the performance scales though.

First I figured out that you need recursion. Then I realized that you can just make your regex 'recursive' by just copying and pasting recursively, like so (using curly braces for clarity):

Starting regex

Finds stuff in brackets that isn't itself brackets.

/{([^{}])*}/g

Then copy and paste the whole regex inside itself! (I spaced it out so you can see where it was pasted in.) So now it is basically like a( x | a( x )b )b

/{([^{}] | {([^{}])*} )*}/g

That will get you one level of recursion and you can continue ad nauseum in this fashion and actually double the amount of recursions each time:

//matches {4{3{2{1}}}}
/{([^{}]|{([^{}]|{([^{}]|{([^{}])*})*})*})*}/g

//matches {8{7{6{5{4{3{2{1}}}}}}}}
/{([^{}]|{([^{}]|{([^{}]|{([^{}]|{([^{}]|{([^{}]|{([^{}]|{([^{}])*})*})*})*})*})*})*})*}/g

Finally I just add |[^{}]+ on the end of the expression to match stuff that is completely outside of brackets. Crazy, but it works for my needs. I feel like there is probably some clever way to combine this concept with a recursive function in order to get a truly recursive matcher, but I can't think of it now.

If you can be sure that the parentheses are balanced (I'm sure there are other resources out there that can answer that question for you if required) and if by "top-level" you're happy to find local as well as global maxima then all you need to do is find any content that starts with an open bracket and closes with a close-bracket, with no intermediate open-bracket between the two:

I think the following should do that for you and helpfully group any "top-level" content:

\(([^\(]*?)\)

That content may not all be at the same "level", but if you think of the nested brackets as describing the branching of a tree, the regex will return to you the leaves. If you pre-process your text to be wrapped in parentheses to start with, and the earlier assumptions are met, you can guarantee always getting at least one "leaf".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM